July 23, 2009

July 16, 2009
eWeek.com recently posted a nice article by Dr. Yves Schabes, founder of Teragram, on how to make enterprise search better through some higher order processing techniques like metadata generation, applying taxonomies, etc. and doing relevance testing on a regular basis. Naturally, this got me thinking about all the different ways this relates to the Apache Lucene ecosystem (Lucene, Solr, Mahout, Tika, etc.) and Lucid Imagination.
First, by choosing an open backbone like Lucene and Solr, you are free…
July 6, 2009
Solr 1.4 contains a new feature that allows range queries or range filters over arbitrary functions. It’s implemented as a standard Solr QParser plugin, and thus easily available for use any place that accepts the standard Solr Query Syntax by specifying the frange query type. Here’s an example of a filter specifying the lower and upper bounds for a function:
fq={!frange l=0 u=2.2}log(sum(user_ranking,editor_ranking))
The other interesting use for frange is to trade off memory for speed when doing…
May 24, 2009
Here’s the announcement from the PyLucene team:
This is a refresher release of Apache PyLucene 2.4.1 that addresses a few bugs and annoyances:
http://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_2_4_1/CHANGES
http://svn.apache.org/repos/asf/lucene/pylucene/tags/pylucene_2_4_1/jcc/CHANGES
Apache PyLucene 2.4.1 is available from the following download page:
http://www.apache.org/dyn/closer.cgi/lucene/pylucene/pylucene-2.4.1-2-src.tar.gz
When downloading from a mirror site, please remember to verify the downloads using signatures found on the Apache site:
http://www.apache.org/dist/lucene/pylucene/KEYS
For more information on Apache PyLucene, visit the project home page:
http://lucene.apache.org/pylucene
May 13, 2009
Recently, Uwe Schindler and others have added a new capability to Lucene and Solr to make working with numeric ranges a lot faster. I haven’t tried out this new functionality yet, so I thought I would walk through it here and explore it’s capabilities.
Since Lucene treats most everything as Strings, encoding numbers and dates and then utilizing them in ranges has always required a little extra work to make it perform well. Previously, one would…
February 28, 2009
It looks like the next release of Lucene is going to be 2.4.1, a bug fix release. The Lucene release ‘animal’ has raised its head over the previous months on two occasions, once eyeing a 2.4.1 release, then refocusing on a 2.9 release. Time has seen 2.4.1 land a few more bugs than we had though, so it looks like 2.4.1 is in the final wrapup stages and 2.9 will come next.
2.9 will likely be…
February 22, 2009
There are a surprising number of query parser options in the Lucene/Solr world – not something I realized very quickly in my early Lucene days. I thought I might highlight a few of the options out there.