Working at Lucid Imagination a customer once asked me about how they could modify the score of the documents in Solr in order to get most relevant results higher in the results list. While I was trying to respond the question I realized that there are too many different options, and that not all of them are very easy to understand, so I decided to write some notes summarizing the most common/most used ways to …
Read more
Wildcard query terms aren’t analyzed, why is that?
Prior to the current 3x branch (which will be released as 3.6) and the trunk (4.0) Solr code, users have frequently been perplexed by wildcard searching being un-analyzed, often manifesting in case sensitivity. Say you have an analysis chain in your schema.xml file defined as follows and a field named lc_field of this type:
<fieldType name="lowercase" class="solr.TextField" >
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowercaseFilterFactory" />
</fieldType>
Now, you index …
Read more
| Tuesday, 29 November 2011 |
to |
Friday, 2 December 2011 |

I’ll be speaking at the upcoming Rich Web Experience conference in Ft. Lauderdale, presenting an “Introduction to Solr”, “Solr Recipes”, and “Lucene for Solr Developers”. I’ll be tying all of these presentations together into a cohesive search/Solr track going from the introduction, to recipes for common tasks, through advanced customization of Solr.…
Read more
Here are my ApacheCon 2011 slides for my talk “Bet You Didn’t Know Lucene Can…” :
Read more
The use of scripting languages to add new functionality to systems is something that I’ve always found very helpful. You don’t have to download the source code of the system, if it has “scriptable” parts you can add simple functionality in minutes without even compiling. Java provides this capabilities in particular with Javascript. You can refer to http://java.sun.com/developer/technicalArticles/J2SE/Desktop/scripting/ for more information on this.
Unfortunately, Java 6′s only included library is Rhino that converts the javascript …
Read more
With another Lucene Eurocon successfully behind us (thanks Barcelona, you’ve been awesome!), it’s time to say hello to Vancouver for ApacheCon. I’ll leave it to others to fill in the blanks on the Barcelona conference other than to say that I am continually amazed by the vibrancy of the Lucene/Solr community and especially grateful to all the committers and contributors who take the time to show up and give talks about how they leverage …
Read more
If you’re running Apache Solr in production, you count on it to deliver solid performance and expect it to be up at all times. Even if you tested your setup with expected data and query load, things can go wrong. Solving those problems as they appear, not only causes service downtime, but is a very unpleasant task. Imagine sleepless nights trying to figure out why your production system went down with an OutOfMemory error. Similar …
Read more
From a quiet start as a pet project to a giant in the industry, Apache Lucene is definitely the little (search) engine that could. On September 18th, 2001 (at 16:29:48 UTC) Jason Van Zyl made the first official import of Doug Cutting’s Lucene project (which started in 1997 and was hosted on SourceForge) into Apache’s Jakarta project (check out the Wayback machine).
And while I wasn’t around in the beginning, I thought I would …
Read more
One month from today we’ll be kicking off Apache Lucene Eurocon in Barcelona, and I will once again be in the hot seat for a session of Stump The Chump.
During the session, moderator and former “Chump” Grant Ingersoll will present me with tough Lucene/Solr questions submitted by users, to see what kind of solutions I can come up with on the spot. A panel of judges will award prizes for questions that “stump” …
Read more
By yonikSeptember 15, 2011
Background
I needed a really good hash function for the distributed indexing we’re implementing for Solr. Since it will be used for partitioning documents, it needed to be really high quality (well distributed) since we don’t want uneven shards. It also needs to be cross-platform, so a client could calculate this hash value themselves if desired, to predict which node has a given document.
MurmurHash3
MurmurHash3 is one of the top favorite new hash function …
Read more