• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
    • Certified Distributions
      • Certified Solr
      • Certified Lucene
    • Apache Releases
      • Apache Solr
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Training
    • Solr/Lucene Certification
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Board of Directors
    • Apache Lucene/Solr Committers
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Sign Up or Log In
Home . DevZone . Events . Conferences . Apache Lucene Eurocon 2011 User Conference | Barcelona October 17-20 2011

Scaling Search at Trovit with Solr and Hadoop

Presented by Marc Sturlese, Trovit at Apache Lucene Eurocon 2011

Trovit is a global classified advertising service covering real estate, jobs and more in 27 countries worldwide. Until recently, our distributed Lucene/Solr search indexes used a customized Data Import Handler to draw data out of MySQL, but they no longer adequately handle our volumes with acceptable performance. We have moved Lucene/Solr indexes using MapReduce and came up with a new way to build indexes which is into production since months ago. Here at Trovit, we deal with many countries and different business categories, each with its own index -- and not all of them have similar size or structure.

I'll present our experience as a combined use case/tutorial, beginning with a brief introduction about the main Solr features we use at Trovit, and then move to the more complex part:

  • Brief explanation of the data pipeline handled by Hadoop before our ads are indexed, with implementation details of the indexing process, deploying indexes from HDFS, etc.
  • Tuning performance parameters to improve indexing speed as much as possible and keep good search performance
  • Managing the effect of GC at search time as much as we can as we deal with shards
  • Moving indexing time Solr features like DeDuplication to MapReduce.
  • Using Solr analyzers to analyze large amounts of text outside of an indexing process

I'll also talk about how we used the phased indexing strategy to manage indexes across countries and verticals (jobs, autos, etc.) and working around limitations in SOLR-1301.

Download session slide.

  • Login or register to post comments

Case Study

Closing the Knowledge Gap: A Case Study - How Cisco Unlocks Communications
Solr Development Case Study: resolutionfinder.org

Whitepapers

Programmer's Guide: Using LucidWorks Enterprise to add Search to your Web Application
Getting Started With LucidWorks Enterprise

DevZone

Latest Blog Post

Indexing with SolrJ
Two popular methods of indexing existing data are the Data Import Handler (DIH) and Tika (Solr Cell)/ExtractingRequestHandler. These can be used to index data from a database or...
  • Tutorials
  • Blog
  • Whitepapers
  • Docs
  • Forums
  • Support
Share
Follow Facebook Twitter LinkedIn YouTube
RSS Feed
  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Website Feedback
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Sitemap
  • Admin

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2012 Lucid Imagination. All Right reserved.