Nested Queries in Solr

The ability to nest an arbitrary query type inside another query type is a useful feature that was quietly added to Solr some time ago, along with the support for query parser plugins to support different query types.I finally got around to fixing nested queries for the function query parser, and figured it was high time I documented nested queries, along with the LocalParams syntax that allows one to add metadata to a query parameter, or even…

Read more...

Apache Nutch 1.0 released

Apache Nutch, a subproject of Apache Lucene, is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats.

Apache Nutch 1.0 contains almost 200 resolved issues and improvements such as Solr Integration, new indexing framework and new scoring framework just to mention a few.

Nutch 1.0 is available from here.

Read more...

New Version of Luke Released

Luke, the very popular Lucene index exploration and modification tool, has released a new version, 0.9.2.

Changes in v. 0.9.2 (released on 2009.03.20):

This release upgrades to Lucene 2.4.1 jars.

  • New features and improvements:
    • Added term counts per field in Overview – contributed by Mark Harwood.
    • Improved the Analysis plugin to show all token information, and highlight whenever a token is selected from the list.
  • Bug fixes:
    • (None)

Read more...

Exploring Lucene’s Indexing Code: Part 2

Previous: Exploring Lucene’s Indexing Code: Part 1

A trace of addDocument is pretty intense, so we are going to have to start at an even higher level I think.

Using some basic IR knowledge, we know that addDocument is going to use our Analyzer to break up each field in the given document, and use the resulting terms to build an inverted index. At its simplest, an inverted index might just be a list of postings, mapping…

Read more...

Lucene and Solr training at ApacheCon Europe

Just a reminder that Erik and Grant are offering Lucene and Solr training at ApacheCon Europe next week.  Grant’s class is a 2-day hands-on training on Lucene designed to get you up and working with Lucene and provide  information about where to go next.  Erik’s class is a 1-day session on getting up and running with Solr.

Also,  note both Erik and I will be at the Lucene meetup on Tuesday night!

Read more...

Lucid Imagination nominated for top tech startup

InformationWeek has us on the ballot for top tech startups.  They’ll unveil the Startup 50 winners in mid-April.

The editorial staff will make the final selection based on reader votes and our analysis of several criteria: innovation and the companies’ ability to inject new ways of doing things into business processes; value, which is reflected in lower costs, increased sales, higher productivity, or improved customer loyalty; and enterprise-readiness, meaning that a product or service scales and…

Read more...

Lucene in Action, 2nd edition, available!

Lucene in Action, 2nd edition is now available through the Manning Early Access Program. We’ve arranged for an exclusive discount, on either printbook+ebook or just the ebook, for our readers. Simply enter the code lucene40 and get 40% off the book until April 1, 2009.

Lucene in Action, Second Edition, completely revises and updates the best-selling first edition and remains the authoritative book on Lucene. This book shows you how to index your documents, including types such as…

Read more...

Lucene 2.4.1 Released

Lucene 2.4.1 has been released. Lucene 2.4.1 is a bug fix release and Lucene 2.9 will follow next.

Read more...

Using Nutch with Solr

The last time I wrote about integrating Apache Nutch with Apache Solr (about two years ago), it was quite difficult to integrate the two components – you had to apply patches, hunt down required components from various places etc. Now there is easier way.The soon to be released Nutch 1.0 contains Solr integration “out of the box”. There are many different ways to take advantage of this new feature, but I am just going to go through…

Read more...

Exploring Lucene’s Indexing Code: Part 1

While I have mucked around quite a bit in the search side code of Lucene, I am much less familiar with the hardcore indexing side (I’m talking the hardcore code, casual users need not apply – unless your interested). I’d like to learn more about Lucene’s indexing code, but its not so easy to wrap my mind around on first glance. In instances like this, I find its best to start from a high level and work my way in, hopefully understanding the overall process, and then each of the pieces.

To help me get a handle on what IndexWriter does, I am going to trace a few key methods from a very simple Lucene test application that simply adds one small document to an index with an IndexWriter and then closes the IndexWriter.

Read more...