• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
    • Certified Distributions
      • Certified Solr
      • Certified Lucene
    • Apache Releases
      • Apache Solr
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Training
    • Solr/Lucene Certification
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Technical Leadership
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Sign Up or Log In
Home . Blog

August 12, 2009

Fake and Invisible Queries

Posted by Grant Ingersoll

Weird title, I know, but they are my pet names (there probably are better terms for them in use elsewhere) for two techniques I find often help people solve search problems in the real world, but don’t necessarily seem like good things to do at first glance.

Fake Queries

If you’ve ever attended on of my trainings, you know I’m fond of saying “Just because a user types words into a search box, doesn’t mean you have to execute a search”.  At first, such a saying seems counterintuitive, but in reality, it works at several levels:

  1. Many, many queries are repeats of previously executed queries where the data has not changed, so just return cached results.  Solr is HTTP Cache friendly, so use it to your advantage.  Additionally, properly tune your Solr caches and your JVM so that the O/S can cache things as well.  Also know when a cache is not effective and thereby avoid needlessly updating a cache that is never hit.
  2. Many times and for many reasons, you may already know what some answers are, independent of any particular user.  For instance, if someone types “benefits” into a search box on your companies HR page, it likely makes sense to make sure that the first result is the main HR Benefits page, regardless if that actually has the best score due to the system scoring.   Use something like the (poorly named, but effective) Query Elevation Component in Solr to setup a mapping between a query and a set of documents which match that query.  This can be used for editorial reasons, ad capabilities, etc.
  3. In certain cases, it may make sense just to go straight to the page.  Wikipedia often does this (try searching for Lucene.)   Technically, this may involve actually running a query and applying some heuristics to determine the one best result, but in other cases it may just mean mapping queries to one result editorially.

Invisible Queries

It is often necessary in many applications to execute more than one query for any given user query.  For instance, in applications that require very high precision (only good results, forgoing marginal results), the app. may have several fields, one for exact matches, one for case-insensitve matches and yet another with stemming.  Given a user query, the app may try the query against the exact match field first and if there is a result, return only that set.  If there are no results, then the app would proceed to search the next field, and so on.  Another example of Invisible Queries is pseudo-relevance feedback, whereby the top X results are assumed to be good and are automatically used to create a new query that is submitted and its results are returned.   Solr and Lucene’s More Like This is an example.  Additionally, one could also do things like automatically submit spelling checking results in the cases where no results are returned for the original query.

Naturally, all of this brings up the performance question.  How can this possibly perform?  The answer is, it may not, so you need to test it.  However, I’ve seen lots of applications where it does, especially when used in a short-circuiting manner (and not in an additive manner).  Additionally, you need to keep an eye on your logs, etc.  In the right applications, it may be the case that a lot of your queries are exact matches, in other cases your users may very well be willing to trade off a few 100 extra milliseconds (often less) in order to have better results.

Next time your in need of an speed boost or perhaps you are unclear on how to get exactly the results you need, I hope some fake and invisible queries will help you out!

  • Share this:
  • Email
  • Facebook
  • Digg
  • Share
  • Print
  • Reddit
  • StumbleUpon

Category: Enterprise Search, Lucene, Solr

Leave a Reply

Go to Blog Front Page

  • Recent Posts

    • Lucene Revolution 2012 – Call for Participation now open!
    • SolrCloud is Coming (and looking to mix in even more ‘NoSQL’)
    • Our Solr Reference Guide updated for v3.5
    • Enhancing Discovery with Solr and Mahout – session slides now available!
    • Solr and LucidWorks feature matrix available
    • LucidWorks Enterprise latest version 2.0.1 released!
    • Why Not AND, OR, And NOT?
    • Options to tune document’s relevance in Solr
    • Dallas JavaMUG December 14th 2011
    • Apache Mahout user meeting – session slides and videos are now available!
  • Archives

    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
  • Tags

    acts_as_solr apache Apache Mahout best practices chump code4lib dismax drupal enterprise search Erik Hatcher field collapsing function query Grant Ingersoll hoss image isfdb local params Lucene lucene revolution LucidGaze lucid imagination Mahout Marc Krellenstein Mark Miller nested queries nutch Open Source Open Source Search qparser query parser queryparser Rails release result grouping Richmond Ruby schema design sint Solr solr 3.1 solr 4.0 solr cloud sortable Tika VA
  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Log in

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2011 Lucid Imagination. All Right reserved.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.