Enterprise Search support for Apache Lucene and Solr by Lucid Imagination

Secondary links

  • Contact Us
  • Log in
  • Downloads
  • Solutions
    • Software |
    • Services |
    • Training |
    • White Papers & Case Studies |
    • Webinars & Events |
  • Developers
    • Blog |
    • Tech Articles |
    • Community |
    • Documentation |
    • Downloads |
    • Webcasts & Podcasts |
  • About
    • Market Overview |
    • Management |
    • Company News |
    • In the Media |
    • Contact |

beta

Start new search

Back to search results

  1. FromDate
  2. Grant Ingersoll2009-06-16 17:35
  3. Shashikant Kore2009-06-16 23:43
  4. Ted Dunning2009-06-17 02:51
  5. Grant Ingersoll2009-06-17 09:14
  6. Grant Ingersoll2009-06-17 09:32
  7. Shashikant Kore2009-06-18 06:17
  8. Grant Ingersoll2009-07-14 09:41
  9. Ted Dunning2009-07-27 21:42
  10. Benson Margulies2009-07-27 21:51
  11. Ted Dunning2009-07-28 00:48
  12. Grant Ingersoll2009-07-28 06:55
  13. Benson Margulies2009-07-28 14:49
  14. Ted Dunning2009-07-28 16:36
  15. Grant Ingersoll2009-08-18 09:55
  16. Grant Ingersoll2009-08-18 10:32
  17. Benjamin Dageroth2009-08-18 11:37
  18. Ted Dunning2009-08-18 13:04
  19. Grant Ingersoll2009-08-18 13:49
  20. Jack Tanner2009-08-18 17:40
  21. Grant Ingersoll2010-01-09 12:18
  22. Grant Ingersoll2010-01-09 13:57
  23. Ted Dunning2010-01-09 15:31
  24. Ted Dunning2010-01-09 15:32

[mahout-user] Validating clustering output

Subject:
Re: Validating clustering output
From:
Ted Dunning <ted.dunning@...>
Date:
2009-07-28 00:48
On Mon, Jul 27, 2009 at 6:51 PM, Benson Margulies <bimargulies@gmail.com>wrote:

[brown and mercer did hard stuff] Of course, you aren't proposing that, just recommending the bigram entropy metric or something like it.
Peter Brown and Bob Mercer were very sharp dudes and when they did this work it was 100 times more amazing than it is now. They had the advantage of working for a company that understood that the resources that you give researchers now should be 20 times more than you would expect a user to have in 5 years, but even so, their achievements were quite something. Frankly that record of achievement leads back beyond them to Fred Jelinek, Lalit Bahl and Selim Roukos and all the other early guys who worked on speech back then. That work (along with the BBN team under Jim and Janet Baker) gave us the entire framework of HMM's and entropy based evaluation that is core to speech systems today. It leads forward to some of the really fabulous work that the della Pietra brothers did as well. I owe the IBM team my interest in statistical approaches to AI and symbolic sequences. It was on a visit to IBM in 1990 or so that Stephen (or Vincent) dP mentioned off-handedly to me that mutual information was "trivially known to be chi-squared distributed asymptotically". That was news to me and formed the basis of a LOT of the work that I have done in the intervening 19 years. -- Ted Dunning, CTO DeepDyve

Solr Powered

Give us your feedback

  • Lucene
  • Solr
  • Nutch
  • Tika
  • Mahout
  • Droids
  • PyLucene
  • Lucene.Net
  • Lucy
  • Lucene4c
  • Open Relevance Project
  • How We Can Help:
    • Getting Started |
    • Support Subscriptions |
    • White Papers |
    • Training |
    • Consulting |
    • Contact Us |
  • Developers:
    • Blog |
    • Documentation |
    • Tech Articles |
    • Podcasts and Videos |
    • Community |
  • Downloads:
    • LucidWorks for Solr |
    • LucidWorks for Lucene |
    • LucidGaze for Solr |
    • LucidGaze for Lucene |
  • Products:
  • Services:

Contact | Privacy Policy | Legal Terms of Use | Copyrights and Disclaimers | Admin

Apache Solr, Apache Lucene, ApacheCon and their logos are trademarks of the Apache Software Foundation.

© 2010 Lucid Imagination. All Right reserved.