• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
    • Certified Distributions
      • Certified Solr
      • Certified Lucene
    • Apache Releases
      • Apache Solr
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Training
    • Solr/Lucene Certification
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Technical Leadership
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Sign Up or Log In
Home . Blog

March 19, 2010

Actual mileage may vary

Posted by David M. Fishman

A few weeks after the announcement from Microsoft that FAST is no longer to be available on Linux/Unix, interesting stories continue to pop up about use of Lucene and Solr in its place. Most recently, a benchmark from Technology Services Group, an open source content management solutions consultancy and integration shop based out of Chicago. In a blog post earlier this week, they describe a proof of concept for a large pharmaceutical client, benchmarking search on 156,000 documents in an external data source indexed by Lucene. The search application was part of a larger CMS solution centered around EMC documentum.

Lucene/HPI [the TSG Documentum Lucene-based solution] and the external repository was found to be considerably quicker that the existing FAST/Webtop implementation on most queries.

Specific results:

Query FAST/Webtop Lucene/HPI
1200 Results 90 seconds 3 seconds
8 Results 5 seconds 3 seconds
10 Results 8 seconds 4 seconds
76 Results 10 seconds 5 seconds
5100 Results 72 seconds 5 seconds
65 Results 6 seconds 3 seconds

Simple configuration of the Lucene index did a better job of returning a more complete search result set than the standard FAST/webtop configuration.  Examples included additional documents that were logical derivatives of the initial search word. For example – a search for “exception report” could return “exceptions report” or “exception reports”. The proof of concept data set also included German documents and Lucene demonstrated multilingual stemming capability.

Better than 10x reduction sure sounds sweet. Now, with any benchmark, the devil is in the details: lies, damned lies, and benchmarks. They’re tougher to construct objectively than a sweet set of outputs might imply. And so for me, the real punchline is in a different set of numbers:

The flexibility of Lucene to index both the metdata and full-text values allowed the client to avoid adding an additional Oracle database to their external cache for attribute storage.

One less check to Oracle — that’s real money.

  • Share this:
  • Email
  • Facebook
  • Digg
  • Share
  • Print
  • Reddit
  • StumbleUpon

Category: Uncategorized

2 Responses to “Actual mileage may vary”

  1. I have a hard time understanding how you could get results that slow on a relatively small corpus — not just for FAST, but Lucene. Were they doing a ton of extra search-time processing (like facets) or something?

    March 19, 2010 11:05 — Matt Chaput

  2. Indeed, Matt — a good observation. Might be interesting to hear from the TSG folks; my sense is that there’s a bunch of CMS overhead in here, but I’m reading between the lines. The workload itself is not something that generally causes Solr to break a sweat.

    March 19, 2010 16:49 — David Fishman

Leave a Reply

Go to Blog Front Page

  • Recent Posts

    • Lucene Revolution 2012 – Call for Participation now open!
    • SolrCloud is Coming (and looking to mix in even more ‘NoSQL’)
    • Our Solr Reference Guide updated for v3.5
    • Enhancing Discovery with Solr and Mahout – session slides now available!
    • Solr and LucidWorks feature matrix available
    • LucidWorks Enterprise latest version 2.0.1 released!
    • Why Not AND, OR, And NOT?
    • Options to tune document’s relevance in Solr
    • Dallas JavaMUG December 14th 2011
    • Apache Mahout user meeting – session slides and videos are now available!
  • Archives

    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
  • Tags

    acts_as_solr apache Apache Mahout best practices chump code4lib dismax drupal enterprise search Erik Hatcher field collapsing function query Grant Ingersoll hoss image isfdb local params Lucene lucene revolution LucidGaze lucid imagination Mahout Marc Krellenstein Mark Miller nested queries nutch Open Source Open Source Search qparser query parser queryparser Rails release result grouping Richmond Ruby schema design sint Solr solr 3.1 solr 4.0 solr cloud sortable Tika VA
  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Log in

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2011 Lucid Imagination. All Right reserved.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.