• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
    • Certified Distributions
      • Certified Solr
      • Certified Lucene
    • Apache Releases
      • Apache Solr
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Training
    • Solr/Lucene Certification
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Technical Leadership
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Sign Up or Log In
Home . Blog

September 22, 2009

Contrived FieldCache Load Test: Lucene 2.4 VS Lucene 2.9

Posted by Mark Miller

*edit* Sorry – jumped the gun with my original test code here – need to close the IndexWriter after the optimize! The gains are only with multi segment indexes. Corrected entry follows:

Lets do a little test. We will load up a FieldCache with 5,000,000 unique strings and see how long it takes Lucene 2.4 in comparison to Lucene 2.9.

Lets use my quad core laptop and the following test code:

public class ContrivedFCTest extends TestCase {
  public void testLoadTime() throws Exception {
    Directory dir = FSDirectory.getDirectory(System.getProperty("java.io.tmpdir") + File.separator + "test");
    IndexWriter writer = new IndexWriter (dir, new SimpleAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
    writer.setMergeFactor(37);
    writer.setUseCompoundFile(false);
    for(int i = 0; i < 5000000; i++) {
      Document doc = new Document();
      doc.add (new Field ("field",  "String" + i, Field.Store.NO, Field.Index.NOT_ANALYZED));
      writer.addDocument(doc);
    }
    writer.close();
 
    IndexReader reader = IndexReader.open(dir);
    long start = System.currentTimeMillis();
    FieldCache.DEFAULT.getStrings(reader, "field");
    long end = System.currentTimeMillis();
    System.out.println("load time:" + (end - start)/1000.0f + "s");
  }
}

The results?

Lucene 2.4: 150.726s
Lucene 2.9: 9.695s

We discovered early this year that in the past, Lucene has been terribly inefficient when loading FieldCaches over multiple segments. Lucene 2.9 addresses this at the MultiReader level (thank you Yonik!). Also, internal FieldCache usage is now per segment, which sidesteps loading FieldCaches over mutiple segments all together – each segment has its own FieldCache.

  • Share this:
  • Email
  • Facebook
  • Digg
  • Share
  • Print
  • Reddit
  • StumbleUpon

Category: Uncategorized

6 Responses to “Contrived FieldCache Load Test: Lucene 2.4 VS Lucene 2.9”

  1. This was our biggest issue by far. Its taken significant load off our servers when installing a new snapshot.

    Thanks indeed Yonik.

    September 22, 2009 16:16 — Jim Murphy - PostRank

  2. Why did you use merge factor of 37?

    September 23, 2009 19:46 — Anonymous

  3. mergeFactor=37 — presumably in order to avoid any segments to be merged during indexing, thus making it possible to show off the new and faster segment reloading.

    September 24, 2009 10:15 — Otis Gospodnetic

  4. Partially – since I am just timing the loading of the FieldCache (and not doing it per segment). It’s just to make sure I have a bunch of segments – its only faster over multiple segments – its the same speed on an optimized Index. The more segments, the faster it is.

    The reason that its also faster when you do it per segment (how Lucene works internally now), is that it avoids the speed trap that was in MultiTermEnum, and uses SegmentTermEnum – Yonik fixed that as well though, and this test shows the fruits of that. So essentially, it was both fixed and side stepped at the same time ;)

    September 24, 2009 11:10 — Mark Miller

  5. [...]   在保证了正确性之后,要关注的便是性能了。根据我的推测,由于IKVM需要在Java生成的.NET程序集和BCL之间加上一层Runtime和JDK,因此其性能几乎一定会比Java原有的程序要差。不过,对于Lucene这种项目来说,算法才是性能的关键。例如,有人测试Lucene 2.9.0在某些情况下会比2.4有15倍左右的性能提升。不过由于没有很好的测试数据和场景,目前我只进行了最最简单的,不涉及磁盘IO的性能比较。 [...]

    September 24, 2010 07:49 — 尝试使用IKVM运行Lucene 2.9.0版 | 彭旭赣州seo优化

  6. [...] Imagination的Mark Miller运行了一个简单的性能测试,表明在5,000,000个不同字符串下的情况下,Lucene [...]

    September 25, 2010 03:37 — Apache Lucene 2.9的改进 | 彭旭赣州seo优化

Leave a Reply

Go to Blog Front Page

  • Recent Posts

    • Lucene Revolution 2012 – Call for Participation now open!
    • SolrCloud is Coming (and looking to mix in even more ‘NoSQL’)
    • Our Solr Reference Guide updated for v3.5
    • Enhancing Discovery with Solr and Mahout – session slides now available!
    • Solr and LucidWorks feature matrix available
    • LucidWorks Enterprise latest version 2.0.1 released!
    • Why Not AND, OR, And NOT?
    • Options to tune document’s relevance in Solr
    • Dallas JavaMUG December 14th 2011
    • Apache Mahout user meeting – session slides and videos are now available!
  • Archives

    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
  • Tags

    acts_as_solr apache Apache Mahout best practices chump code4lib dismax drupal enterprise search Erik Hatcher field collapsing function query Grant Ingersoll hoss image isfdb local params Lucene lucene revolution LucidGaze lucid imagination Mahout Marc Krellenstein Mark Miller nested queries nutch Open Source Open Source Search qparser query parser queryparser Rails release result grouping Richmond Ruby schema design sint Solr solr 3.1 solr 4.0 solr cloud sortable Tika VA
  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Log in

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2011 Lucid Imagination. All Right reserved.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.