July 23, 2009

July 16, 2009
eWeek.com recently posted a nice article by Dr. Yves Schabes, founder of Teragram, on how to make enterprise search better through some higher order processing techniques like metadata generation, applying taxonomies, etc. and doing relevance testing on a regular basis. Naturally, this got me thinking about all the different ways this relates to the Apache Lucene ecosystem (Lucene, Solr, Mahout, Tika, etc.) and Lucid Imagination.
First, by choosing an open backbone like Lucene and Solr, you are free…
March 28, 2009
Apache Nutch, a subproject of Apache Lucene, is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats.
Apache Nutch 1.0 contains almost 200 resolved issues and improvements such as Solr Integration, new indexing framework and new scoring framework just to mention a few.
Nutch 1.0 is available from here.