A billion here, a billion there

“…pretty soon you’re talking about real money”, goes the famous phrase attributed to the late Senator Everett Dirksen (apocryphally, it turns out).

The phrase came to mind in reading a recent post from Tom Burton-West of the Hathi Trust, on running into an index-size limitation of 2.47 billion words on a base of 555,000 documents.

When we read that the Lucene index format used by Solr has a limit of 2.1 billion unique words per index segment,  we didn’t think we…

Read more...

Open Source: Whiter Teeth, Fresher Breath?

Some years ago, when open source was the fairly-long-haired hairshirt scruffy shorts-wearing barbarian at the gate, there was real sturm-und-drang around droll, berkeley-esque phrases like “copy-left” and “viral licensing”, enough to make some people wonder if these open source types used deodorant. And the RIAA, god bless ‘em, was running around suing high school students and other customers who had the temerity to take a different view of the traditional marriage of digital content and…

Read more...

Lucene 2.9.2 and 3.0.1

The vote is on for what I think is a Lucene first – two simultaneous bug fix releases. Because the Lucene 2 series is the last to support Java 1.4, we are doing a bug fix release for for 2.9 as well as the recently released Java 1.5 required 3.0 release.

A little preview from the proposed release announce:

Important improvements in these releases are a increased maximum number of unique terms in each index segment. They…

Read more...

Lucene and Logs: Update

A couple more notes on this subject since the Webinar from a couple of weeks ago:

Steve Arnold of Beyond Search asks in a blog post:

…the notion of integrating log files is a good one but I wondered how long it takes to suck big log files, determine deltas, and then update the indexes.

We’ve offered some of the information from the Webinar in a case study we’ve posted about our work with Boomi:

The logging-and-searching service is characterized by…

Read more...

Shocked? Microsoft holding FAST to Windows and dot-net

“I am shocked, SHOCKED, to find out that gambling is going on here,” says Claude Raines as Captain Louis Renault, as he feigns disbelief in busting up Rick’s Cafe in the classic film Casablanca — right before the croupier hands him a wad of bills.

I think we were all equally “shocked” to learn that Microsoft would drop support for FAST ESP (Enterprise Search Platform) on anything but Windows, as they announced this past week. But for…

Read more...

New Lucid Imagination beta website now up, feedback welcome!

You are invited to visit the new LucidImagination.com, now up for public beta.

What’s new? We have improved the navigation with a simplified, streamlined hierarchy; provided more relevant links and content across the site; and  added new content to help those interested in using Lucene/Solr open source search to drive business growth.

Search Application Developers:
We have developed a focused resource area which will be at developer.lucidimagination.com. Here you will find loads of technical information, resources and downloads for Lucene and…

Read more...

Search by the Book

By now, many of you have had the opportunity to use the online, searchable version of the LucidWorks Certified Distribution Reference Guide for Solr 1.4. In this post, I’ll describe how we took the original document version of the Reference Guide (LWCDRG), and transformed it into an online resource searched by Solr.

I hope that you might find this useful if you are faced with creating a similar, online searchable service from existing documents.

Read more...

The Seven Deadly Sins of Solr

Working at Lucid Imagination gives me the opportunity to analyze and evaluate a great many instances of Solr implementations, running in some of the largest Fortune 500 companies as well as some of the smallest start-ups. This experience has enabled me to identify many common mistakes and pitfalls that occur, either when starting out with a new Solr implementation, or by not keeping up with the latest improvements and changes.Thanks to my colleague Simon Rosenthal…

Read more...

Training Class: Introduction to Search Development with Solr

Monday, 15 February 2010 to Wednesday, 17 February 2010

Introduction to Search Development with Solr
Search Development with Solr by Lucid Imagination is a 3 day instructor-led hands-on in-classroom training course, written and led by the engineers who helped write the Lucene/Solr code. The objective of this course is to provide you with real use cases and teach you how to apply Solr search engine technologies to business requirements. During the course you will learn to apply best practices…

Read more...

Solr Search User Interface Examples

A recent Slashdot poster asked for Solr-powered “Attractive Open Source Search Interfaces”.  First, for some inspiration on what you might want to have in a search user interface, check out Peter Morville’s excellent set of screenshot examples.  One of my favorite examples is, of course, from the library space.  Morville showcases the NCSU library system site on one of his sets:

Several Solr-powered open source faceted navigation search systems for libraries have been built with various technologies:  Blacklight (Ruby…

Read more...