Solr’s New Clustering Capabilities

Introduction

One of the new things in Solr 1.4 that I am particularly excited about is the new document and search results clustering capabilities.  This is an optional module that lives in Solr’s contrib/clustering directory and was added via SOLR-769.  The module is designed to allow people to either use the existing clustering capabilities, currently only search result clustering is offered via Carrot2, or to plug in their own capabilities.  While some of…

Read more...

Lucene 2.9 is released

Hello Lucene users,
On behalf of the Lucene dev community (a growing community far larger
than just the committers) I would like to announce the release of
Lucene 2.9.
 
While we generally try and maintain full backwards compatibility
between major versions, Lucene 2.9 has a variety of breaks that are
spelled out in the ‘Changes in backwards compatibility policy’ section
of CHANGES.txt.
 
We

Read more...

Webinar: “Apache Lucene 2.9: Discover the Powerful New Features” presented by Grant Ingersoll

Thursday, 24 September 2009
11:00 to 12:00

Lucene 2.9 offers a rich set of new features and performance improvements alongside plentiful fixes and optimizations. If you are a Java developer building search applications with the Lucene search library, this webinar provides the insights you need to harness the power of this important update to Apache Lucene.

Grant will present and discuss key technical features and innovations including:

  • Real time/Per segment searching and caching
  • Built in numeric

Read more...

Contrived FieldCache Load Test: Lucene 2.4 VS Lucene 2.9

*edit* Sorry – jumped the gun with my original test code here – need to close the IndexWriter after the optimize! The gains are only with multi segment indexes. Corrected entry follows:

Lets do a little test. We will load up a FieldCache with 5,000,000 unique strings and see how long it takes Lucene 2.4 in comparison to Lucene 2.9.

Lets use my quad core laptop and the following test code:

public Read more...

Lucene 2.9 Release Vote Has Begun

It took a couple more RC’s than I guessed (5 total), but the final vote candidate is up, and unless something critical is found during the 3 day vote process, Lucene 2.9, almost a year in the making, will be available by the end of the week.

http://search.lucidimagination.com/search/document/f15d32710b70ca6b/vote_release_lucene_2_9_0

Read more...

Save $200 and visit us at Enterprise Search Summit West

Tuesday, 17 November 2009 to Thursday, 19 November 2009

Save $200 and visit Lucid Imagination at

Enterprise Search Summit West
November 17-19, 2009
San Jose McEnery Convention Center
http://enterprisesearchsummit.com/west2009/

Save $200 off the conference pass and get a free Expo pass.
Go here:
https://secure.infotoday.com/forms/default.aspx?form=esswest
and use discount code: VIPLI

Read more...

Java Garbage Collection Boot Camp (Draft)

I’m working on a Garbage Collection article – I figured I’d share an early rough draft:

It’s not often the case, but sometimes when working with a large and busy Solr/Lucene installation, Garbage Collection becomes a bottleneck. This guide is meant to help you relieve that bottleneck should it arise.

Garbage collection in Java is the processes of freeing the memory used by objects that are no longer in…

Read more...

JavaZone

First, major kudos to the JavaZone team for putting on one of the most impressive conferences I’ve attended and had the honor at which to speak.  From the tasty and hearty speakers dinner to Carl delivering my post-talk absent mindedly forgotten power cable to my hotel, the organizers put on a very well organized and classy event.  The conference not only sported top notch technical content from the best of the best…

Read more...

Partner Event: ISYS Webinar, Enterprise Search and Information Access

Thursday, 17 September 2009
09:00 to 10:00

Join guest presenter Brian Pinkerton, Chief Architect of Lucid Imagination, as he makes an appearance with our partners at ISYS, at their Webinar entitled The Changing Face of Enterprise Search and Information Access. ISYS provides the ISYS File Readers, an embeddable set of over 200 file filters to extract text and other metadata from different file formats that you can download for trial from our Downloads page.

Read more...

Posting Rich Documents to Apache Solr using SolrJ and Solr Cell (Apache Tika)

Solr Cell, a new feature in the soon to be released Solr 1.4, allows users to send in rich documents such as MS Word and Adobe PDF directly into Solr and have them indexed for search.  All of the examples on the Solr Cell wiki page, however only demonstrate how to send in the documents using the curl command line utility, while many Solr users rely on SolrJ, Solr’s Java-based client.  Thus, I thought I…

Read more...