• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
    • Certified Distributions
      • Certified Solr
      • Certified Lucene
    • Apache Releases
      • Apache Solr
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Training
    • Solr/Lucene Certification
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Technical Leadership
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Sign Up or Log In
Home . Blog

May 31, 2009

Built-In Solr Index Replication with Solr 1.4

Posted by Mark Miller

Replication has always been one of Solr’s cooler features, but its been hampered by the Unix features it employs. Unix scripts mixed in with run (almost) anywhere Java is enough to make anyone sigh. Users of Solr on Windows have been somewhat left out in the cold. That’s all changing though, because Solr 1.4 will bring a new, built-in, replication feature that works as a Solr RequestHandler.

The authors of the new RequestHandler have posted some valuable info, benchmarking the new replication scheme with the older script based replication. New users looking to start with large indices’s are always looking for this type of info. Seeing how long it takes to transfer a 2 gig index on some fairly normal setup will give a gut feeling for the transfer time you can expect to see on your setup (keeping in mind that the whole index will not always be transferred unless you optimize first every time).

Transfer Time

You can find this image at the bottom of http://wiki.apache.org/solr/SolrReplication

Looking at the graph to the left, you can see that the old style scripts method with rsync is a tad slower, but not enough to really matter. Its nice to know the new built-in replication is a small gain rather than a small loss though.

One thing missing from the given info is the specs of the systems/network that were used. We can play with some numbers and make some guesses.

There is another diagram on the SolrReplication page that gives us the exact numbers. Using 2100MB in 217 seconds, we can see that the index was moved at 9.68 MB per second using the new built-in replication method. That’s a bit over 7 minutes for 4 gigabytes. Is that a normal number or were they using RAID 0 Super Drives and 100 Gigabit networks?

Well we know that the two main bottlenecks are going to be the hard drive speed and the network speed. Here, it looks like one of the two topped out at almost 10 MB per second. Normal?

Lets start with the drive. We can look up what a first gen serial ATA drive can do on Wikipedia. 150 MB/s. Unfortunately, that’s the theoretical maximum speed of the bus. The maximum sustained transfer speed of the drive will actually fall far short. Too bad, because a 6 Gbps interface was just demo’d by Seagate (doubling SATA2). We are limited to the drives though, and after reading lots of random hearsay, it looks like you can expect about 25-30 MB/s on a 5200rpm laptop drive, and about 50-60 MB/s on a standard 7200rpm drive (sustained transfer rates). Or you can jump on the high end and get something like these raptor drives that claim sustained transfer rates over 100 MB/s. That doesn’t look like our bottleneck. 60 MB/s is 4 gigabytes in about a minute, 8 seconds or a gig in 17 seconds.

So on to the network. First thing to look at is prob the speed of your standard 100base-X (Fast Ethernet). That’s a theoretical maximum of 12.5 MB/s (according to Wikipedia), which translates to about 9-10 MB/s real world based on a few google searches. That looks like our bottleneck. Moving up the line we have 1000base-X (Gigabit Ethernet) with a theoretical max of 125 MB/s and an apparent real world of anywhere from 30 to 60 MB/s. In the wireless world, 802.11b appears to have a real-world max of about 0.5 MB/s, 802.11g : 2.5 MB/s, and 802.11n : 9.3 MB/s.

  • Share this:
  • Email
  • Facebook
  • Digg
  • Share
  • Print
  • Reddit
  • StumbleUpon

Category: Uncategorized

6 Responses to “Built-In Solr Index Replication with Solr 1.4”

  1. We are pretty eagerly waiting to upgrade to 1.4. We are currently on 1.3 and the normal replication is very operational heavy for us (In terms of deployment and other support).

    Hopefully we can upgrade to the new one soon :-)

    May 31, 2009 10:57 — Raghu Kashyap

  2. [...] Lucene search or Solr index search to enjoy efficient and fast search results. Tags- Stichwort: apache lucene, computer, [...]

    January 13, 2010 10:50 — SchauMalRein » Blog Archive » How to Operate Lucene Search

  3. [...] Lucene indexsearcher and Solr index search to get more relevant and fast results for your enterprise searches. Related PostsNovember 16, [...]

    March 31, 2010 06:00 — Lucene/Solr – For Simple & Powerful Enterprise Searches | Technology Computer - Video Games Tips & News

  4. [...] Lucene search or Solr index search to enjoy efficient and speedy search [...]

    April 18, 2010 15:22 — How to Use Lucene Search Efficiently | search inmates

  5. [...] Lucene indexsearcher and Solr index search to get more relevant and fast search results for your enterprise [...]

    June 3, 2010 09:57 — The Best Enterprise Search Tool – Lucene/Solr | VWQUICKS

  6. [...] Lucene search or Solr index search to enjoy efficient and speedy search results. Share and [...]

    December 30, 2010 12:27 — How to Use Lucene Search Efficiently | Acai Berry Juice Report

Leave a Reply

Go to Blog Front Page

  • Recent Posts

    • Lucene Revolution 2012 – Call for Participation now open!
    • SolrCloud is Coming (and looking to mix in even more ‘NoSQL’)
    • Our Solr Reference Guide updated for v3.5
    • Enhancing Discovery with Solr and Mahout – session slides now available!
    • Solr and LucidWorks feature matrix available
    • LucidWorks Enterprise latest version 2.0.1 released!
    • Why Not AND, OR, And NOT?
    • Options to tune document’s relevance in Solr
    • Dallas JavaMUG December 14th 2011
    • Apache Mahout user meeting – session slides and videos are now available!
  • Archives

    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
  • Tags

    acts_as_solr apache Apache Mahout best practices chump code4lib dismax drupal enterprise search Erik Hatcher field collapsing function query Grant Ingersoll hoss image isfdb local params Lucene lucene revolution LucidGaze lucid imagination Mahout Marc Krellenstein Mark Miller nested queries nutch Open Source Open Source Search qparser query parser queryparser Rails release result grouping Richmond Ruby schema design sint Solr solr 3.1 solr 4.0 solr cloud sortable Tika VA
  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Log in

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2011 Lucid Imagination. All Right reserved.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.