• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
    • Certified Distributions
      • Certified Solr
      • Certified Lucene
    • Apache Releases
      • Apache Solr
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Training
    • Solr/Lucene Certification
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Technical Leadership
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Sign Up or Log In
Home . Blog

October 4, 2009

The high bar for relevancy?

Posted by David M. Fishman

A big chunk of the billions that go to search-engine marketing and search engine optimization, SEM and SEO, (mostly to you-know-who) are spent on getting to Page 1 of the results.

I won’t be the first to point out that relevance for in-house search — i.e., without using Pagerank — is a harder nut to crack. How much harder? A recent study from Aberdeen Group, publicized this week in Information Week, provides the following stat:

At top performing companies [defined as "Best in Class", the top 20% of those surveyed], 67% of searches returned the most relevant results on the first search results page, while lower rated companies saw relevant results on the first page for only 42% of searches.

1 out of 3 searches at best don’t deliver the right search on the first results page. In other words, the best case for search=find is 67%.

Relevancy is as much art as science; the best solutions for the problem are the ones that provide a way to match the art to the science. If you need some background on relevancy, read the seminal article on relevancy and findability by Grant Ingersoll, and check out the fine presentation on the subject by Mark Bennett of New Idea Engineering delivered at the most recent SFBay Lucene/Solr Meetup we co-sponsored with the Computer History Museum in early September.

One of the best implementations of findability in Lucene and Solr I’ve come across is at Netflix. There’s a really nice discussion captured in some slides by Walter Underwood, who helped built the Solr search infrastructure at Netflix (a milestone in a very distinguished career in search). He gave a terrific presentation at that same most recent Meetup.

A key metric Walter used at Netflix to gauge finding (search relevancy effectiveness is such a mouthful) is called Mean Reciprocal Rank, or MRR. Simply put, it gives one point for a click through to the first-ranked item, 1/2 a point to the second ranked item, 1/3 of a point to the 3d ranked, etc. While it may not help find relevancy bugs, it provides a very nice aggregate picture of users’ experience finding what they look for. A good benchmark, or stretch goal, according to Walter: 0.5 MRR, with 85% of clicks on #1.

Let me be quick to say that there is much that is unique about the Netflix search use case (and much that is really, really fun). But the contrast between 85% of results selected at #1 in the results ranking, vs. 2/3 of results on the first page at best in class enterprise search implementations, leads me to wonder: what are others doing to measure relevancy and programmatically build feedback loops, automatic or otherwise? Lucene and Solr provide transparent, rich interfaces for doing this; and according to the Aberdeen study, there’s plenty of opportunity to do so.

  • Share this:
  • Email
  • Facebook
  • Digg
  • Share
  • Print
  • Reddit
  • StumbleUpon

Category: Enterprise Search, Events, Lucene, Relevancy, Solr

One Response to “The high bar for relevancy?”

  1. Netflix may be an extreme case of known-item search, where people are always looking for the title (or a person) instead of a class of things. In addition, the studios spend millions of dollars teaching those titles (the search terms) to people. That doesn’t mean that people can spell “Ratatouille” or “Coraline”, but it helps.

    At a site selling cordless drills or fleece jackets, I seriously doubt you would get 85% of clicks on the first result. That would be dominated by informational searches, not known-item, known-title searches.

    If your site deals with mass-market artistic works (books, film, music), the searches might look more like Netflix’s.

    Thanks for the writeup, it was a fun meetup with lots of good presentations. The Babbage engine is amazing, too.

    October 5, 2009 10:30 — Walter Underwood

Leave a Reply

Go to Blog Front Page

  • Recent Posts

    • Lucene Revolution 2012 – Call for Participation now open!
    • SolrCloud is Coming (and looking to mix in even more ‘NoSQL’)
    • Our Solr Reference Guide updated for v3.5
    • Enhancing Discovery with Solr and Mahout – session slides now available!
    • Solr and LucidWorks feature matrix available
    • LucidWorks Enterprise latest version 2.0.1 released!
    • Why Not AND, OR, And NOT?
    • Options to tune document’s relevance in Solr
    • Dallas JavaMUG December 14th 2011
    • Apache Mahout user meeting – session slides and videos are now available!
  • Archives

    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
  • Tags

    acts_as_solr apache Apache Mahout best practices chump code4lib dismax drupal enterprise search Erik Hatcher field collapsing function query Grant Ingersoll hoss image isfdb local params Lucene lucene revolution LucidGaze lucid imagination Mahout Marc Krellenstein Mark Miller nested queries nutch Open Source Open Source Search qparser query parser queryparser Rails release result grouping Richmond Ruby schema design sint Solr solr 3.1 solr 4.0 solr cloud sortable Tika VA
  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Log in

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2011 Lucid Imagination. All Right reserved.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.