March 28, 2009
Apache Nutch, a subproject of Apache Lucene, is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats.
Apache Nutch 1.0 contains almost 200 resolved issues and improvements such as Solr Integration, new indexing framework and new scoring framework just to mention a few.
Nutch 1.0 is available from here.
Read more...
March 20, 2009
Luke, the very popular Lucene index exploration and modification tool, has released a new version, 0.9.2.
Changes in v. 0.9.2 (released on 2009.03.20):
This release upgrades to Lucene 2.4.1 jars.
- New features and improvements:
- Added term counts per field in Overview – contributed by Mark Harwood.
- Improved the Analysis plugin to show all token information, and highlight whenever a token is selected from the list.
- Bug fixes:
Read more...
March 18, 2009
Previous: Exploring Lucene’s Indexing Code: Part 1
A trace of addDocument is pretty intense, so we are going to have to start at an even higher level I think.
Using some basic IR knowledge, we know that addDocument is going to use our Analyzer to break up each field in the given document, and use the resulting terms to build an inverted index. At its simplest, an inverted index might just be a list of postings, mapping…
Read more...
March 16, 2009
Just a reminder that Erik and Grant are offering Lucene and Solr training at ApacheCon Europe next week. Grant’s class is a 2-day hands-on training on Lucene designed to get you up and working with Lucene and provide information about where to go next. Erik’s class is a 1-day session on getting up and running with Solr.
Also, note both Erik and I will be at the Lucene meetup on Tuesday night!
Read more...
March 12, 2009
InformationWeek has us on the ballot for top tech startups. They’ll unveil the Startup 50 winners in mid-April.
The editorial staff will make the final selection based on reader votes and our analysis of several criteria: innovation and the companies’ ability to inject new ways of doing things into business processes; value, which is reflected in lower costs, increased sales, higher productivity, or improved customer loyalty; and enterprise-readiness, meaning that a product or service scales and…
Read more...
March 11, 2009
Lucene in Action, 2nd edition is now available through the Manning Early Access Program. We’ve arranged for an exclusive discount, on either printbook+ebook or just the ebook, for our readers. Simply enter the code lucene40 and get 40% off the book until April 1, 2009.
Lucene in Action, Second Edition, completely revises and updates the best-selling first edition and remains the authoritative book on Lucene. This book shows you how to index your documents, including types such as…
Read more...
March 9, 2009
Lucene 2.4.1 has been released. Lucene 2.4.1 is a bug fix release and Lucene 2.9 will follow next.
Read more...
March 4, 2009
While I have mucked around quite a bit in the search side code of Lucene, I am much less familiar with the hardcore indexing side (I’m talking the hardcore code, casual users need not apply – unless your interested). I’d like to learn more about Lucene’s indexing code, but its not so easy to wrap my mind around on first glance. In instances like this, I find its best to start from a high level and work my way in, hopefully understanding the overall process, and then each of the pieces.
To help me get a handle on what IndexWriter does, I am going to trace a few key methods from a very simple Lucene test application that simply adds one small document to an index with an IndexWriter and then closes the IndexWriter.
Read more...