<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; Jay Hill</title>
	<atom:link href="http://www.lucidimagination.com/blog/author/jay-hill/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Solr Shines Through the Cloud &#8211; LucidWorks Solr on EC2</title>
		<link>http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/</link>
		<comments>http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/#comments</comments>
		<pubDate>Mon, 01 Feb 2010 16:47:21 +0000</pubDate>
		<dc:creator>Jay Hill</dc:creator>
				<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[solr cloud]]></category>
		<category><![CDATA[solr on EC2]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1448</guid>
		<description><![CDATA[<h1>Overview</h1>
<p>Lucid Imagination has now packaged the latest version of our <a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr"><strong>LucidWorks Certified Distribution for Solr 1.4</strong></a> as an <a href="http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?concepts-amis-and-instances.html"><strong>Amazon Machine Instance</strong></a> (AMI). It&#8217;s available for free to anyone with an <a href="http://aws.amazon.com">Amazon Web Services</a> account and Amazon EC2 access. In this post, I&#8217;ll outline what this is about, what it can do for you and instructions on how to get it started.</p>
<p>Our free AMI running LucidWorks for Solr offers the following benefits:</p>
<ol>
<li>A very </li>&#8230;</ol>]]></description>
			<content:encoded><![CDATA[<h1>Overview</h1>
<p>Lucid Imagination has now packaged the latest version of our <a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr"><strong>LucidWorks Certified Distribution for Solr 1.4</strong></a> as an <a href="http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/index.html?concepts-amis-and-instances.html"><strong>Amazon Machine Instance</strong></a> (AMI). It&#8217;s available for free to anyone with an <a href="http://aws.amazon.com">Amazon Web Services</a> account and Amazon EC2 access. In this post, I&#8217;ll outline what this is about, what it can do for you and instructions on how to get it started.</p>
<p>Our free AMI running LucidWorks for Solr offers the following benefits:</p>
<ol>
<li>A very straightforward way to get a Solr instance up and running for those who are new to Solr and want to try it for the first time.</li>
<li>An easy way to test out the LucidWorks certified distribution of Solr.</li>
<li>A convenient starting point with a machine instance that can be used by anyone interested in running Solr in the cloud.</li>
</ol>
<p>The AMI we&#8217;ve provided is a very basic Solr instance using the default example configuration files that Solr includes as a starting place for new users, along with a set of nineteen documents indexed and ready for searching. If you are interested in building Solr instances in the cloud this AMI is a good place to begin: simply edit the configuration files for your needs, set up persistent storage with Amazon <a href="aws.amazon.com/ebs/">EBS</a> or <a href="https://s3.amazonaws.com/">S3</a> for your indexes, and you have a machine instance that can be bundled and registered for your own use.</p>
<h1>Amazon EC2</h1>
<p>Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. Cloud computing is a way to offer internet-based services, sometimes categorized into the following descriptions: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). Services are sold on demand and a consumer is charged by the hour for the services that they use. EC2 is one of the most popular cloud computing providers.</p>
<p>Running LucidWorks for Solr on Amazon EC2 requires that you have an account on Amazon Web Services, and that you are somewhat familiar with how to deploy an Amazon Machine Instance, but we&#8217;ll describe the process in detail, step-by-step to get you up and running easily.</p>
<p>These instructions assume you are already familiar with running machine instances in EC2. We provide detailed instructions on how to get the LucidWorks instance up and running, but this is not a tutorial on how to use EC2 if you are not already familiar with it. If you do not yet have an Amazon Web Services account, and are unfamiliar with EC2, here is a <a href="http://aws.amazon.com/ec2/">link to an introduction on how to get started</a>. Once you have an account and are comfortable working with the EC2 tools, you&#8217;ll be ready to proceed to the next steps in this guide.</p>
<h1><img class="alignnone" title="Download LucidWorks for Solr" src="http://www.lucidimagination.com/sites/all/themes/lucid-dev/images/Lucidworks4solr.png" alt="Download LucidWorks for Solr" width="159" height="44" /></h1>
<p>LucidWorks for Solr version 1.4, as you may already know, is a comprehensive, tested and release-stable certified distribution of Apache Solr, available for free. LucidWorks for Solr offers:</p>
<ul>
<li>A solid foundation of <strong>reliability and consistency </strong>for production-grade use of open source</li>
<li><strong>Fast, convenient access to documentation </strong>needed to build better search applications faster</li>
<li><strong>Quick, simplified setup </strong>and maintenance of Solr and its constituent components</li>
</ul>
<p>Running LucidWorks for Solr on EC2 is possibly the easiest and fastest way to get a Solr instance up and running.</p>
<h1>Steps to Run LucidWorks for Solr on Amazon EC2</h1>
<p>OK, let&#8217;s create an instance of the LucidWorks for Solr AMI. We&#8217;ll show how to do this using the AWS Management Console. You could also use Elasticfox, a Mozilla Firefox extension for managing your Amazon EC2 account. If you are experienced with Elasticfox the steps will be similar to these and should be familiar to you.</p>
<h2>AWS Management Console Instructions</h2>
<p>Log into your AWS account. You should be at the start location for the Management Console:</p>
<h1><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/awsConsole1.png"><img class="alignnone size-full wp-image-1692" style="border: 1px solid black;" title="awsConsole1" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/awsConsole1-e1264962077199.png" alt="" width="600" height="321" /></a></h1>
<p>Before we launch the LucidWorks AMI we&#8217;ll need to first set up some security settings to allow access to two ports on the instance: 8983 for Solr, and 22 for ssh.</p>
<p>From the AWS Management Console under the Network &amp; Security Settings section click on the &#8220;Security Groups&#8221; link, and from the Security page click on the &#8220;Create Security Group&#8221; button, and then enter the following values:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/security1.png"><img class="alignnone size-full wp-image-1730" style="border: 1px solid black;" title="security1" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/security1.png" alt="" width="460" height="229" /></a></p>
<p>At the bottom of the security page you will need to set up and save two connection methods, one for SSH, the other a &#8220;Custom&#8221; connection for the Solr port:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/security2.png"><img class="alignnone size-full wp-image-1731" title="security2" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/security2-e1264971381931.png" alt="" width="500" height="159" /></a></p>
<p>We&#8217;ll use that new security group when we launch an instance of the LucidWorks AMI.</p>
<p>Now click on the &#8220;AMIs&#8221; link under the Images section. In the &#8220;Viewing&#8221; drop down box select &#8220;Public Images&#8221;, and in the input box enter &#8220;lucidworks&#8221;. You should see one machine image avaiable:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/imagesAMIlucidworks.png"><img class="alignnone size-full wp-image-1700" title="imagesAMIlucidworks" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/imagesAMIlucidworks.png" alt="" width="530" height="153" /></a></p>
<p>Check the checkbox to the left and click on the &#8220;Launch&#8221; button above it.</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch.png"><img class="alignnone size-full wp-image-1703" style="border: 1px solid black;" title="launch" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch-e1264962654518.png" alt="" width="500" height="332" /></a></p>
<p>Select the options for number of instances, machine size, and availability zone. The defaults are fine to get started.</p>
<p>On the second page just take the default values presented:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch2.png"><img class="alignnone size-full wp-image-1704" style="border: 1px solid black;" title="launch2" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch2-e1264962857331.png" alt="" width="500" height="332" /></a></p>
<p>On the third page select your key pairs, which you should already have set up:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch3.png"><img class="alignnone size-full wp-image-1709" style="border: 1px solid black;" title="launch3" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch3-e1264963138426.png" alt="" width="500" height="333" /></a></p>
<p>On the next page you&#8217;re asked to select the security groups to enable. Select the solr8983 group that we configured earlier. This will allow connections to port 8983 for Solr and port 22 for ssh:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch4.png"><img class="alignnone size-full wp-image-1710" style="border: 1px solid black;" title="launch4" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch4-e1264963259113.png" alt="" width="500" height="333" /></a></p>
<p>Finally you&#8217;re asked to review your settings. If everything looks good go ahead and click the &#8220;Launch&#8221; button:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch5.png"><img class="alignnone size-full wp-image-1711" style="border: 1px solid black;" title="launch5" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch5-e1264963454697.png" alt="" width="500" height="337" /></a></p>
<p>You should see a confirmation that the instance is launching:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch6.png"><img class="alignnone size-full wp-image-1712" style="border: 1px solid black;" title="launch6" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/launch6-e1264963609145.png" alt="" width="500" height="249" /></a></p>
<p>Now you&#8217;re instance should be starting up. Back at the main AWS Management Console page click on the &#8220;Instances&#8221; link in the Instances section:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/instancePending.png"><img class="alignnone size-full wp-image-1713" title="instancePending" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/instancePending-e1264963738334.png" alt="" width="500" height="97" /></a></p>
<p>It will take a few minutes for the instance to launch. Reload the page until the Status changes from &#8220;pending&#8221; to &#8220;running&#8221;. Once the Status shows the instance is &#8220;running&#8221; you should be able to connect to the Solr Admin Console. You&#8217;ll need to know the public DNS of your instance. With the checkbox checked on this instance there will be a few lines of status output at the bottom of the AWS Management Console that look this:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/status.png"><img class="alignnone size-full wp-image-1716" style="border: 1px solid black;" title="status" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/status.png" alt="" width="512" height="282" /></a></p>
<p>Use the public DNS to build the URL to access the Solr Admin Console: http://ec2-174-129-150-181.compute-1.amazonaws.com:8983/solr/admin/</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/lwcdAdminConsole.png"><img class="alignnone size-full wp-image-1719" style="border: 1px solid black;" title="lwcdAdminConsole" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/lwcdAdminConsole-e1264969035974.png" alt="" width="500" height="258" /></a></p>
<p>Now you&#8217;ve got a Solr instance up and running with nineteen sample documents in the index. If you&#8217;re new to Solr make sure to <a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide">download Lucid Imagination&#8217;s free reference guide</a> as a good start to learning about Solr.</p>
<p><a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide"><img class=" alignnone" style="margin: 10px;" title="LucidWorks for Solr Reference Guide" src="http://www.lucidimagination.com/files/image/LG4S_referenceguide_cover2.png" alt="LucidWorks for Solr Reference Guide" width="200" height="240" /></a></p>
<h1>Final Step: Logging into the Instance to Stop and Start LucidWorks for Solr</h1>
<p>With the instance now up and running you may want to make changes, index more content, etc. You&#8217;ll need to ssh into the machine instance to do these tasks. From the AWS Management Console select the machine instance and find the &#8220;Instance Actions&#8221; list. From that list select the &#8220;Connect&#8221; option:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/connect.png"><img class="size-full wp-image-1726 alignnone" title="connect" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/connect.png" alt="" width="260" height="358" /></a></p>
<p>You should see instructions for connecting to your instance via ssh:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/ssh.png"><img class="alignnone size-full wp-image-1727" style="border: 1px solid black;" title="ssh" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/ssh.png" alt="" width="460" height="433" /></a></p>
<p>Once you have successfully logged into the instance you will find the LucidWorks for Solr home directory here: /opt/LucidWorks. Within that directory there is a &#8220;start&#8221; and a &#8220;stop&#8221; script for stopping and starting Solr if you make changes to the schema.xml or solrconfig.xml files. The Solr home directory is /opt/LucidWorks/lucidworks/solr and the index directory is /opt/LucidWorks/lucidworks/solr/data/index.</p>
<p>The instance is configured to start up with the following heap size settings: -Xms1g -Xmx3g. These are set in the Tomcat start script that manages LucidWorks: /opt/LucidWorks/lucidworks/tomcat/bin/catalina.sh</p>
<p>Note that the machine instance does not have any persistent storage, so if you want to start working with your own data and creating indexes that persist, you will need to use the LucidWorks AMI as a starting point, and you will have to add permanent storage, either Amazon EBS or Amazon S3.</p>
<p>Hopefully this has given you a chance to experience Solr running in the cloud, and shown some of the basics you&#8217;ll need to build on to implement cloud-hosted distributed search with LucidWorks for Solr.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>The Seven Deadly Sins of Solr</title>
		<link>http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/</link>
		<comments>http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 07:04:41 +0000</pubDate>
		<dc:creator>Jay Hill</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1519</guid>
		<description><![CDATA[<table class="plain" border="0" width="100%" bordercolor="#ffccff">
<tbody>
<tr valign="top">
<td width="60%">Working at Lucid Imagination gives me the opportunity to analyze and evaluate a great many instances of Solr implementations, running in some of the largest Fortune 500 companies as well as some of the smallest start-ups. This experience has enabled me to identify many common mistakes and pitfalls that occur, either when starting out with a new Solr implementation, or by not keeping up with the latest improvements and changes.Thanks to my colleague Simon Rosenthal </td></tr></tbody>&#8230;</table>]]></description>
			<content:encoded><![CDATA[<table class="plain" border="0" width="100%" bordercolor="#ffccff">
<tbody>
<tr valign="top">
<td width="60%">Working at Lucid Imagination gives me the opportunity to analyze and evaluate a great many instances of Solr implementations, running in some of the largest Fortune 500 companies as well as some of the smallest start-ups. This experience has enabled me to identify many common mistakes and pitfalls that occur, either when starting out with a new Solr implementation, or by not keeping up with the latest improvements and changes.Thanks to my colleague Simon Rosenthal for suggesting the title, and to Simon, Lance Norskog, and Tom Hill for helpful input and suggestions.So, without further ado…the <em>Seven Deadly Sins of Solr.</em></td>
<td width="5%"> </td>
<td width="35%"><strong><em>You might also be interested in:</em></strong></p>
<ul>
<li><a href="http://www.lucidimagination.com/Solutions/Webinars/Analyze-This-Tips-and-tricks-getting-LuceneSolr-Analyzer-index-and-search-your-content">Analyze This! Tips and tricks on getting the Lucene/Solr Analyzer to index and search your content right</a> &#8211; On-demand Webinar</li>
<li><a href="http://www.lucidimagination.com/solutions/Webinars/Apache-Solr-14-Faster-Easier-and-More-Versatile-Ever">Apache Solr 1.4: Faster, Easier and More Versatile than Ever</a> &#8211; On-demand Webinar</li>
<li><a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr">Solr 1.4 Download</a></li>
<li><a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide">Solr 1.4 Reference Guide</a></li>
</ul>
</td>
</tr>
</tbody>
</table>
<h2>Sin number 1: Sloth</h2>
<p>﻿<a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/sloth.png"><img class="alignnone size-medium wp-image-1522" title="sloth" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/sloth-300x200.png" alt="I'll do it later" width="300" height="200" /></a></p>
<p>Let&#8217;s define sloth as <em>laziness or indifference</em>. This one bites most of us at some time or another. We just can&#8217;t resist the impulse to take a shortcut, or we simply refuse to acknowledge the amount of effort required to do a task properly. Ultimately we wind up paying the price, usually with interest. Here are some common examples of how laziness or indifference lead to Solr problems.</p>
<ul>
<li><strong>A general lack of commitment either to Solr or to the search application project itself</strong>. Sometimes you see this when a company has decided to switch from a commercial search application to open-source alternatives like Lucene and Solr. The engineers involved in the project are used to the &#8220;old ways&#8221; and really don&#8217;t feel like mastering <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/blind.png"><img class="alignright size-thumbnail wp-image-1551" title="blind" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/blind-150x150.png" alt="" width="150" height="150" /></a>another search technology. So without making even the slightest effort they will claim that Solr is inefficient, difficult to learn, not worth the effort, etc. If you&#8217;re hungry it&#8217;s usually not productive to stand around waiting for a fried chicken to fly into your mouth &#8211; your time might be better spent being a little more active in your efforts to acquire some food. Open-source software is flexible, adaptable, and powerful, but the developers that become the experts at open-source solutions are those that are not afraid to roll up their sleeves and dive in to learn what they need. Participate in the mailing lists. Open the source code. Read the javadocs and the wiki. I&#8217;ve worked with customers that have embraced Solr and become experts in a fairly short amount of time, even contributing patches within a few weeks of starting their project. On the other hand I&#8217;ve seen problems fester and grow because a team just won&#8217;t put any effort into their Solr implementation. &#8220;There are none so blind as those who will not see.&#8221;</li>
<li><strong>Not reviewing, editing, or changing the default schema.xml and/or solrconfig.xml files</strong>. If I had a dollar for every production Solr instance that was statically warming the query &#8220;solr rocks&#8221; I could afford a years worth of support from a commercial search vendor. The default example configuration files are there to be used as, yes, <em>examples</em>, and as starting points. Take the time to learn about the configuration settings and field types, and make the best use of them. Remove anything that is not being used (how many times have you really queried that &#8220;partitioned&#8221; request handler&#8230;) Keep your configuration files lean and mean and maintainable and it will pay off in the long run.</li>
<li><strong>Ignoring the dismax query parser</strong>. I&#8217;ve seen cases where someone has written a custom query parser on their own when the work they needed to do could have easily been done with the dismax query parser. There are two different extremes to why folks sometimes avoid dismax. On one side there is the feeling that it is a &#8220;dumbed down&#8221; parser. I think part of the problem here is caused from the first line on the DismaxRequestHandler wiki page (and by the way, we still suffer from this unfortunate legacy nomenclature &#8211; it is a query parser, not a request handler) which says that dismax is &#8220;designed to process simple user entered phrases&#8221;. There is sometimes a feeling that it is merely an entry-level tool for those who don&#8217;t want to do any work crafting their queries. Au contraire! Dismax has an enormous amount of power and flexibility. Which leads to the second side of &#8220;dismax avoidance&#8221;, namely that it&#8217;s &#8220;too complicated&#8221;. Indeed, it is somewhat complicated. But the rewards of spending some time to get familiar with it are substantial.</li>
<li><strong>Not enough attention on JVM settings and garbage collection</strong>. One needn&#8217;t become a JVM Jedi to run a well-tuned Solr instance, but some time spent on learning the basics about different garbage collector types and monitoring the JVM with tools such as JConsole, will pay dividends. A good starting place is a <a href="http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot-camp-draft/">blog by my colleague Mark Miller</a> of Lucid Imagination. Another good resource is this <a href="http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html">document put out by Sun</a>.</li>
</ul>
<h2>Sin number 2: Greed</h2>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/greed.png"><img class="alignnone size-medium wp-image-1523" title="greed" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/greed-234x300.png" alt="We already have plenty of RAM" width="234" height="300" /></a></p>
<p>Penny-wise and pound-foolish. This is a surprisingly all-too-common trap that some fall into. Obviously not everyone has an unbounded budget, but sometimes terrible decisions are made to constrain resources, decisions that will prove to be more costly over time. For example:</p>
<ul>
<li><strong>Refusal to add the proper amount of RAM to a server</strong>. There have been occasions when I&#8217;ve had more RAM on my Mac laptop (4GB) than some of the production Solr implementations I&#8217;ve seen. Sometimes even Solr projects at large companies have been under-funded. There will be business requirements that make high memory demands (sorting on large String fields, lots of faceting on fields with huge numbers of distinct terms, etc.) but the expectation will be that this can somehow be &#8220;made to work&#8221; with an insufficient amount of RAM and some kind of wizardry. A friend of mine has a saying, &#8220;You can&#8217;t fit 15 pounds of rice into a 10 pound bag.&#8221; By all means commit to at least acquiring the minimum adequate amount of resources.</li>
<li><strong>Insisting on running indexing and searching on the same host</strong>. One of the first recommendations we at Lucid Imagination often make to customers is to separate the indexing and searching process to (at least) two separate nodes. There are several benefits to be gained by doing this. First, the indexing and searching processes are not competing for resources (cpu, memory, etc.). Second, the master and slave(s) can be configured slightly differently for optimum performance. Be sure to budget for adequate hardware based on your document count, index size, and expected query volume.</li>
</ul>
<h2>Sin number 3: Pride</h2>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/pride.png"><img class="alignnone size-medium wp-image-1526" title="pride" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/pride-180x300.png" alt="I've got this covered" width="180" height="300" /></a></p>
<p>Pride (for our purposes): failing to acknowledge the good work of others. An excessive love of self.</p>
<p>Engineers love to code. Sometimes to the point of wanting to create custom work that may have a solution in place already, just because: a) They believe they can do it better. b) They believe they can learn by going through the process. c) It &#8220;would be fun&#8221;. This is not meant to discourage new work to help out with an open-source project, to contribute bug fixes, or certainly to improve existing functionality. But be careful not to rush off and start coding before you know what options already exist. Measure twice, cut once.</p>
<ul>
<li><strong>Don&#8217;t re-invent the wheel</strong>. I&#8217;ve seen developers almost look for excuses to write their own query parser or other custom component. Sometimes such effort is necessary, and luckily open-source software makes this doable in ways that would never be possible with commercial search software. But make sure you have a real need before writing custom code &#8211; at least while on the company&#8217;s dime. There is extra effort in maintaining a custom codebase and keeping it in sync with Solr, so make sure it really is the only option to solve a particular use case.</li>
<li><strong>Make use of the mailing lists and the list archives</strong>. This should be obvious, but there are still many who think that this is beneath them in some way, as if asking for help was somehow a flaw. On the other hand, when posting to the mailing lists, make efficient use of everyone&#8217;s time. Be sure to thoroughly search the list archives before posting. (<a href="http://search.lucidimagination.com">LucidFind</a> makes it a snap to search relevant mailing lists, wikis, blogs, javadoc, and other sources in one place.) If and when you do post a question provide a succinct description of the problem and make it clear to others what you need. Stay on topic throughout the thread. Lucene and Solr committers and Lucid Imagination staff are regular participants on the mailing lists, so take advantage of this resource when you have a real need.</li>
</ul>
<h2>Sin number 4: Lust</h2>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/lust.png"><img class="alignnone size-medium wp-image-1529" title="lust" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/lust-280x300.png" alt="Must have more!" width="280" height="300" /></a></p>
<p>You&#8217;ll have to grant me artistic license on this one, or else we won&#8217;t be able to keep this blog G-rated. So let&#8217;s define lust as &#8220;an unnatural craving for something to the point of self-indulgence or lunacy&#8221;. OK.</p>
<ul>
<li><strong>Setting the JVM Heap size too high, not leaving enough RAM for the OS</strong>. So we finally get the RAM allocation we&#8217;ve been pining for (see: Greed) and now what do we do? We&#8217;ve got 16GB of RAM on our machine now so we allocate 15GB to the heap where Solr is running. Whoa! Time for a cold shower! Solr may be the only object of your desire, but don&#8217;t neglect the operating system. Patience and attention come into play here. Monitor the JVM under load and determine what the real memory footprint of Solr is. You&#8217;ll want the operating system to be able to cache file system data (especially the Lucene indexes) so be sure to leave enough RAM for the OS.</li>
<li><strong>Too much attention on the JVM and garbage collection</strong>. On the other hand (and in direct contrast to our first bullet-point under Greed), don&#8217;t overdo it on the JVM. There are seemingly unending ways to tweak and tune a JVM. Don&#8217;t fall into the trap of trying every arcane JVM or GC setting unless you are a JVM expert. Once you have mastered the basics of the JVM and understand the differences between the different types of garbage collectors, for the most part you shouldn&#8217;t have to get too creative. Don&#8217;t just toss &#8220;-XX:CMSIncrementalDutyCycleMin=10&#8243; into the mix out of curiosity.</li>
<li><strong>Trying to &#8220;push the envelope&#8221; on auto-warm counts</strong>. How warm is too warm? We all want the fastest search response times possible, and auto-warming Solr&#8217;s queryResultCache and filterCache are important tools to help keep responses for the most popular queries as fast as possible. But let&#8217;s not get carried away. Excessive auto-warm counts can cause excessive warm-up <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/flame.png"><img class="alignright size-thumbnail wp-image-1571" title="flame" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/flame-150x150.png" alt="" width="150" height="150" /></a>times for the caches, which in turn affect the warm-up time of new IndexSearchers after every commit. Ask yourself if you really need to auto-warm the top 5,000 queries every time a commit occurs. It&#8217;s very easy to get obsessed with this and find that the time to warm-up a new IndexSearcher is extending beyond your commit cycles, which can lead to all kinds of odd behavior, including OutOfMemory Exceptions. Make sure you know what your average warm-up times are for all of your caches and your new IndexSearchers. It&#8217;s actually best to start out with more modest auto-warm counts and work up if necessary rather than start out too high. Create reports or database records of user queries by parsing your production log files. Use that data to get a feel for what the most popular queries are. Sometimes just setting an auto-warm count to 100 is plenty. But it takes time and effort to find the sweet spot between caches that are too cool and caches that are &#8220;en fuego&#8221;.</li>
</ul>
<h2>Sin number 5: Envy</h2>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/envy.png"><img class="alignnone size-full wp-image-1530" title="envy" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/envy.png" alt="Bah!" width="295" height="259" /></a></p>
<ul>
<li><strong>Wanting features that other sites have, that you really don&#8217;t need</strong>. Stay focused on your business needs. Make sure you know what you really need from your search application. A common scenario on the mailing lists is one that Lucene/Solr committer Chris Hostetter calls the &#8220;XY&#8221; problem. From the Solr user mailing list: <em>&#8220;You are dealing with &#8216;X&#8217;, you are assuming &#8216;Y&#8217; will help you, and you are asking about &#8216;Y&#8217; without giving more details about the &#8216;X&#8217; so that we can understand the full issue.  Perhaps the best solution doesn&#8217;t involve &#8216;Y&#8217; at all&#8221;</em>. Know what you need and keep focused on the requirements.</li>
<li><strong>Wanting to have a bigger index than the other guy</strong>. The antithesis of the &#8220;greed&#8221; issue of not allocating enough resources. &#8220;Shooting for the moon&#8221; and trying to allow for possible growth over the next 20 years. A trap for those who believe their status is determined by the size of their server farm. By all means plan ahead, but don&#8217;t expect that you can see into the future to foresee every possible scenario. Plan smartly, but don&#8217;t overdo it.</li>
</ul>
<h2>Sin number 6: Gluttony</h2>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/glutonny.png"><img class="size-medium wp-image-1531 alignnone" title="glutonny" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/glutonny-300x225.png" alt="" width="300" height="225" /></a></p>
<p>&#8220;Staying fit and trim&#8221; is usually good practice when designing and running Solr applications. A lot of these issues cross over into the &#8220;Sloth&#8221; category, and are generally cases where the extra effort to keep your configuration and data efficiently managed is not considered important.</p>
<ul>
<li><strong>Lack of attention to field configuration in the schema</strong>. Storing fields that will never be retrieved. Indexing fields that will never be searched. Storing term vectors, positions and offsets when they will never be used. Unnecessary bloat. Understand your data and your users and design your schema and fields accordingly.</li>
<li><strong>Unexamined queries that are redundant or inefficient</strong>. I&#8217;ve seen cases where queries have been generated programatically with a lot of redundancy and nonsensical logic. Take advantage of filter queries whenever possible. For example, if you have a query like this &#8211; &amp;q=content:solr AND datasource:wiki AND category:search AND language:en &#8211; use filter queries on fields where it makes sense: &amp;q=content:solr&amp;fq=datasource:wiki&amp;fq=category:search&amp;fq=language:en.</li>
</ul>
<h2>Sin number 7: Wrath</h2>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/wrath.png"><img class="alignnone size-medium wp-image-1532" title="wrath" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/01/wrath-300x225.png" alt="Now!" width="300" height="225" /></a></p>
<p>While wrath is usually considered to be synonymous with anger, let&#8217;s use an older definition here: &#8220;a vehement denial of the truth, both to others and in the form of self-denial, impatience.&#8221;</p>
<ul>
<li><strong>Assuming you will never need to re-index your data</strong>. It&#8217;s easy to focus on schema design, configuration, deployment, scaling issues, performance, and relevance tuning, while neglecting to consider how to re-create your index in the event of a disaster, either major or minor. One step that should never be omitted from your planning is a step to consider how to re-create your index in the case of hardware failures. If you are replicating from a master to a slave or slaves, consider having an extra slave that might not be used for searching but can receive replications of the index to serve as a backup to the master. If feasible back up your index data to other storage media. At the very least, if you don&#8217;t have a large index and can re-create it without too much effort if it is deleted or lost, make sure you have a plan and procedures in place in preparation for quickly re-indexing.</li>
<li><strong>Rushing to production</strong>. Of course we all have deadlines, but you only get one chance to make a first impression. Years ago I was part of a project where we released our search application prematurely (ahead of schedule) because the business decided it was better to have <em>something</em> in place rather than not have a search option. We developers felt that, with another four weeks of work we could deliver a fully-ready system that would be an excellent search application. But we rushed to production with some major flaws. Customers of ours were furious when they searched for their products and couldn&#8217;t find them. We developed a bad reputation, angered some business partners, and lost money just because it was deemed necessary to have a search application up and running four weeks early.</li>
</ul>
<p>So keep it simple, stay smart, stay up to date, and keep your search application on the straight-and-narrow. Seek (intelligently) and ye shall find.</p>
<hr /><strong><em>You might also be interested in:</em></strong></p>
<ul>
<li><a href="http://www.lucidimagination.com/Solutions/Webinars/Analyze-This-Tips-and-tricks-getting-LuceneSolr-Analyzer-index-and-search-your-content">Analyze This! Tips and tricks on getting the Lucene/Solr Analyzer to index and search your content right</a> &#8211; On-demand Webinar</li>
<li><a href="http://www.lucidimagination.com/solutions/Webinars/Apache-Solr-14-Faster-Easier-and-More-Versatile-Ever">Apache Solr 1.4: Faster, Easier and More Versatile than Ever</a> &#8211; On-demand Webinar</li>
<li><a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr">Solr 1.4 Download</a></li>
<li><a href="http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr/Reference-Guide">Solr 1.4 Reference Guide</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Auto-Suggest From Popular Queries Using EdgeNGrams</title>
		<link>http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/</link>
		<comments>http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 01:48:22 +0000</pubDate>
		<dc:creator>Jay Hill</dc:creator>
				<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1016</guid>
		<description><![CDATA[<p>A popular feature of most modern search applications is the auto-suggest or auto-complete feature where, as a user types their query into a text box, suggestions of popular queries are presented. As each additional character is typed in by the user the list of suggestions is refined. There are several different approaches in Solr to provide this functionality, but we will be looking at an approach that involves using EdgeNGrams as part of the analysis &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>A popular feature of most modern search applications is the auto-suggest or auto-complete feature where, as a user types their query into a text box, suggestions of popular queries are presented. As each additional character is typed in by the user the list of suggestions is refined. There are several different approaches in Solr to provide this functionality, but we will be looking at an approach that involves using EdgeNGrams as part of the analysis chain. Two other approaches are to use either the TermsComponent (new in Solr 1.4) or faceting.</p>
<h2>N-grams and Edge N-grams</h2>
<p>An N-gram is an n-character substring of a longer sequence of characters. For example, the term &#8220;cash&#8221; is composed of the following n-grams:</p>
<ul>
<li>unigrams: &#8220;c&#8221;, &#8220;a&#8221;, &#8220;s&#8221;, &#8220;h&#8221;</li>
<li>bigrams: &#8220;ca&#8221;, &#8220;as&#8221;, &#8220;sh&#8221;</li>
<li>trigrams: &#8220;cas&#8221;, &#8220;ash&#8221;</li>
<li>4-grams: &#8220;cash&#8221;</li>
</ul>
<p>N-grams can be useful when substrings of terms need to be searched. An Edge n-gram is an n-gram built from one side or edge of a term. Edge n-grams for the term &#8220;cash&#8221; would be:</p>
<ul>
<li>&#8220;c&#8221;, &#8220;ca&#8221;, &#8220;cas&#8221;, &#8220;cash&#8221;</li>
</ul>
<p>It&#8217;s easy to see how edge n-grams could be used to suggest queries as a user types in a search query character by character.</p>
<h2>An Overview of the Process</h2>
<p>In order to provide query suggestions we will need to have typical queries entered by users available in a Solr index. It is a good practice to capture and analyze the queries that are being entered by the users of a search application. Ideally you might have a scheduled process to parse your Solr output logs to capture queries entered by users of the application. The queries might be stored in a relational database where they could be analyzed independently of your running production Solr instances. Another benefit of storing queries (and the number of times they have been entered) is that it is then possible to use Solr&#8217;s DataImportHandler to build an index of query information that can be used to power an auto-suggest feature. You might design an auto-auggest index as a separate core hosted in a single Solr instance along with a core for your main index. Note that it is probably not worth indexing queries that return zero results since we won&#8217;t want to suggest those to a user, so we&#8217;ll include a boolean field to let us know which queries contain results. A minimal table design for our needs might look like this:</p>
<pre style="padding-left: 30px">create table autosuggest (
 query varchar(250),
 hasResults boolean,
 count int);</pre>
<p>More metadata could certainly be added as needed for reporting and analysis &#8211; these are just the columns we will need to demonstrate how to build the auto-suggest feature. As queries are parsed from the log files the process will need to query the main Solr index to deterine whether the query has one or more results in order to populate the hasResults column.</p>
<p>Typically an AJAX front-end would be used to query the &#8220;auto-suggest&#8221; index. For the purposes of this article we won&#8217;t deal with how to build the AJAX component, nor will we go into details about the pre-processing involved in building the database table. Instead we will focus on how to configure Solr to index the queries and then search the index, with responses written in JSON format using Solr&#8217;s JSON response writer.</p>
<p>In general the steps we will take are:</p>
<ol>
<li>Parse log files to get a list of queries entered by users and load those queries (and any other useful metadata) into a database table. This should be an ongoing scheduled process.</li>
<li>Configure schema.xml.</li>
<li>Configure a dih-config.xml file.</li>
<li>Do a full-import with the DataImportHandler.</li>
<li>Build queries that can be used by an AJAX client.</li>
</ol>
<h2>Configuring schema.xml</h2>
<p>We will need to define a fieldType that doesn&#8217;t tokenize the query and does very minimal analysis. We can use the KeywordTokenizerFactory which will leave our queries intact as a single term. We&#8217;ll include the LowerCaseFilterFactory to simplify input, and then, most importantly, we will include the EdgeNgramFilterFactory. The EdgeNgramFilterFactory will break up our query sources into a series of EdgeNgrams. We need to specify a minGramSize and a maxGramSize. For this example we&#8217;ll set the minimum to &#8220;1&#8243; and the maximum to &#8220;25&#8243;. Note that storing n-grams in an index will require some extra storage, but since our &#8220;auto-suggest&#8221; index will only contain two fields (and only one of them stored), and since we can probably expect a document count that is not too high, this should not be a problem.</p>
<p>Here is our definition for a fieldType to accomplish what we need:</p>
<pre>&lt;fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100"&gt;
 &lt;analyzer type="index"&gt;
   &lt;tokenizer class="solr.KeywordTokenizerFactory"/&gt;
   &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
   &lt;filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" /&gt;
 &lt;/analyzer&gt;
 &lt;analyzer type="query"&gt;
   &lt;tokenizer class="solr.KeywordTokenizerFactory"/&gt;
   &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
 &lt;/analyzer&gt;
&lt;/fieldType&gt;</pre>
<p>We should only need two fields defined, &#8220;user_query&#8221; to hold the query and it&#8217;s n-grams, and &#8220;count&#8221; which is the number of times a query has been found in the logs, and is what we can sort on to present more popular queries higher in the suggested list:</p>
<pre>&lt;field name="user_query" type="edgytext"
  indexed="true" stored="true" omitNorms="true"
  omitTermFreqAndPositions="true" /&gt;
&lt;field name="count" type="int" indexed="true"
  stored="false" omitNorms="true"
  omitTermFreqAndPositions="true" /&gt;</pre>
<h2>Configuring dih-config.xml</h2>
<p>Solr&#8217;s DataImportHandler (DIH) is an extremely fast and efficient tool to use for indexing data in a relational database. There are two places where configuration needs to be done to enable the DIH. First, we need to set up a request handler in solrconfig.xml:</p>
<pre>&lt;requestHandler name="/indexer/autosuggest"
    class="org.apache.solr.handler.dataimport.DataImportHandler"&gt;
 &lt;lst name="defaults"&gt;
 &lt;str name="config"&gt;dih-config.xml&lt;/str&gt;
 &lt;/lst&gt;
 &lt;lst name="invariants"&gt;
 &lt;str name="optimize"&gt;false&lt;/str&gt;
 &lt;/lst&gt;
&lt;/requestHandler&gt;</pre>
<p>Then we need to create the dih-config.xml indicated in the request handler configuration. Create a file named &#8220;dih-config.xml&#8221; in the same directory as solrconfig.xml and schema.xml with thw following contents:</p>
<pre>&lt;?xml version="1.0"?&gt;
&lt;dataConfig&gt;
 &lt;dataSource
   type="JdbcDataSource"
   readOnly="true"
   driver="com.mysql.jdbc.Driver"
   url="jdbc:mysql://localhost:3306/myDatabase"
   user="user"
   password="password"/&gt;

  &lt;document name="autoSuggester"&gt;
    &lt;entity name="main"
       query="select query, count from autosuggest where hasResults = true"&gt;
     &lt;field column="query" name="user_query"/&gt;
     &lt;field column="count" name="count"/&gt;
   &lt;/entity&gt;
 &lt;/document&gt;
&lt;/dataConfig&gt;</pre>
<p>The dataConfig is pretty straightforward. We set up a connection to our datasource and then run a simple select query to get all records that have results. It would also be possible to set up a delta-import query but depending on how many user-entered queries you are indexing it may not be necessary. The DIH is very fast, and my tests were able to index about 75,000 documents a minute (on a MacBook Pro with a 2GB heap size), so running a full index once or several times a day may be easier. If you want to set up delta queries you would have to add a date field to the table that could be included in the delta queries. (For more details on DIH configuration see the wiki: <a href="http://wiki.apache.org/solr/DataImportHandler" target="_blank">http://wiki.apache.org/solr/DataImportHandler</a>)</p>
<h2>Indexing The Data</h2>
<p>Now we&#8217;re ready to index the data in the autosuggest table. This can be done in two different ways:</p>
<ol>
<li>Using a utility like &#8220;curl&#8221; or &#8220;wget&#8221;: curl &#8216;http://localhost:8983/solr/indexer/autosuggest?command=full-import (note: when developing and debugging you can add the parameter &#8220;&amp;rows=10&#8243; or whatever number you would like to limit the import.)</li>
<li>From the dataimport.jsp page: http://localhost:8983/solr/admin/dataimport.jsp. You should see a link to the handler defined in solrconfig.xml (which we named /indexer/autosuggest). Click on that link. The dih-config.xml file is displayed along with various buttons for different operations. Clicking on &#8220;Full-import&#8221; will fire off the full import process.</li>
</ol>
<p>Now we should have documents in our auto-suggest index. We can use the analysis page from the admin console (http://localhost:8983/solr/admin/analysis.jsp) to get a look at how the n-grams are created for a query. From the analysis page select &#8220;type&#8221; from the &#8220;Field&#8221; drop-down box and enter a value of &#8220;edgytext&#8221;, the fieldType we defined in schema.xml&#8221; Check &#8220;verbose output&#8221; in the &#8220;Field value&#8221; section and enter &#8220;i&#8217;m not down&#8221; in the text area. Click &#8220;Analyze&#8221; and observe how the EdgeNGramFilterFactory breaks up our query into n-grams:</p>
<p><img class="alignnone size-full wp-image-1027" src="http://www.lucidimagination.com/blog/wp-content/uploads/2009/09/analysis.png" alt="analysis" width="590" height="342" /></p>
<h2>Running Some Queries</h2>
<p>Now we can do some testing to see if we get the results we expect. My test index was for a music site where users are often entering song titles as keyword searches. Let&#8217;s say I&#8217;m searching for the song &#8220;I&#8217;m Not Down&#8221;. The first character typed should trigger the following query sent from the AJAX front-end: http://localhost:8983/solr/select/?q=user_query:&#8221;i&#8221;&amp;wt=json&amp;fl=user_query&amp;indent=on&amp;echoParams=none&amp;rows=10&amp;sort=count desc</p>
<p>Note that we are using the JSON response writer (&amp;wt=json), asking only for the field we&#8217;re interested in (&amp;fl=user_query), turning off echoParams (&amp;echoParams=none) to keep the response as small as possible, and sorting on the count field descending (&amp;sort=count desc). (&amp;indent=on is set for clean display, but this should be omitted for a production site.) Since we entered the count values for each query into the database we can sort on that to get the most popular searches at the top of our results. The response for this query with only the first character entered looks like this:</p>
<pre>{
 "responseHeader":{
 "status":0,
 "QTime":1},
 "response":{"numFound":12,"start":0,"docs":[
 {
 "user_query":"i'm only sleeping"},
 {
 "user_query":"i'm glad"},
 {
 "user_query":"i'm not down"},
 {
 "user_query":"i'm a believer"},
 {
 "user_query":"i'm not your stepping stone"},
 {
 "user_query":"i'm not in love"},
 {
 "user_query":"i'm shakin'"},
 {
 "user_query":"i've been everywhere"},
 {
 "user_query":"i'm hurtin'"},
 {
 "user_query":"i'm in the mood for love"}]
 }}</pre>
<p>Let&#8217;s jump ahead to the fifth character entered: &#8220;i&#8217;m n&#8221; &#8211; now the results are more restricted:</p>
<pre>{
 "responseHeader":{
 "status":0,
 "QTime":4},
 "response":{"numFound":3,"start":0,"docs":[
 {
 "user_query":"i'm not down"},
 {
 "user_query":"i'm not your stepping stone"},
 {
 "user_query":"i'm not in love"}]
 }}</pre>
<p>Note that it&#8217;s necessary to wrap the query in double-quotes as a phrase. Otherwise unpredictable and unwanted matches can occur.</p>
<h2>In Conclusion</h2>
<p>We&#8217;ve shown one way to implement an auto-suggest feature. But no matter which approach you take it&#8217;s important to know what kinds of queries your users are entering, and how to index that data in such a way that it improves the end-user experience for your search application. Parsing log files to capture queries and storing them in a database not only makes it easy to index your query data for auto-suggestions, but the data can also be useful for reporting and analysis, as well as building query lists for load testing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
	</channel>
</rss>

