<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; LucidWorks</title>
	<atom:link href="http://www.lucidimagination.com/blog/category/lucidworks/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Solr and LucidWorks feature matrix available</title>
		<link>http://www.lucidimagination.com/blog/2012/01/03/solr-and-lucidworks-feature-matrix-available/</link>
		<comments>http://www.lucidimagination.com/blog/2012/01/03/solr-and-lucidworks-feature-matrix-available/#comments</comments>
		<pubDate>Tue, 03 Jan 2012 21:51:08 +0000</pubDate>
		<dc:creator>Cassandra Targett</dc:creator>
				<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4589</guid>
		<description><![CDATA[<p>We get asked a lot by customers what&#8217;s in a new Solr/Lucene release that applies to them, and with our own LucidWorks Platform available, customers naturally want to know what they&#8217;ll get that they don&#8217;t already have. If you&#8217;re happily running along on Solr 1.4, why or when should you update to a newer version? Should you migrate to LucidWorks?</p>
<p>So we decided to try to put together a matrix of major features and show &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>We get asked a lot by customers what&#8217;s in a new Solr/Lucene release that applies to them, and with our own LucidWorks Platform available, customers naturally want to know what they&#8217;ll get that they don&#8217;t already have. If you&#8217;re happily running along on Solr 1.4, why or when should you update to a newer version? Should you migrate to LucidWorks?</p>
<p>So we decided to try to put together a matrix of major features and show in which versions they are available. Solr 1.4 is pretty old by now, so it naturally appears not to hold up well against Solr 3.5, Solr Trunk, or LucidWorks. Think of it as the base from which the later features in the list grow.</p>
<p>This was an interesting exercise to work through. It&#8217;s easy to read through the changes.txt for each release and try to include everything in a list such as this (and our Support guys are probably disappointed that I didn&#8217;t do that), but I tried to keep it to the major innovations or bug fixes so it stays somewhat readable. But there&#8217;s always the question of whether it&#8217;s too much or too little detail.</p>
<p>I hope it&#8217;s useful and we&#8217;d like to know what you think. Is it worthwhile? Should we go to deeper detail? Could the features use more explanation? Look it over at <a href="http://www.lucidimagination.com/devzone/references/feature-matrix-solr-and-lucidworks">Feature Matrix for Solr and LucidWorks</a> and share your suggestions in the comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2012/01/03/solr-and-lucidworks-feature-matrix-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Monitoring Apache Solr and LucidWorks with Zabbix</title>
		<link>http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/</link>
		<comments>http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/#comments</comments>
		<pubDate>Sun, 02 Oct 2011 19:42:16 +0000</pubDate>
		<dc:creator>alexey</dc:creator>
				<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4061</guid>
		<description><![CDATA[<p>If you&#8217;re running Apache Solr in production, you count on it to deliver solid performance and expect it to be up at all times. Even if you tested your setup with expected data and query load, things can go wrong. Solving those problems as they appear, not only causes service downtime, but is a very unpleasant task. Imagine sleepless nights trying to figure out why your production system went down with an OutOfMemory error. Similar &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re running Apache Solr in production, you count on it to deliver solid performance and expect it to be up at all times. Even if you tested your setup with expected data and query load, things can go wrong. Solving those problems as they appear, not only causes service downtime, but is a very unpleasant task. Imagine sleepless nights trying to figure out why your production system went down with an OutOfMemory error. Similar situations are actually more common than desired &#8211; no free disk space, running out of file descriptors, no free memory for OS level file system cache, high cpu load and so forth.</p>
<p>There is special class of software programs called monitoring software that are widely used among system and network administrators. In our case we would like to monitor not only OS level metrics, but also Solr internal parameters and act accordingly. LucidWorks and Apache Solr provide lots of valuable information through a JMX interface, so you can hook that up into your monitoring tool.</p>
<p>Zabbix is one of the most popular open source monitoring tools. It has many features like an easy-to-use web interface, different ways to gather metrics data, an ability to keep this data in persistent storage, built-in graphing, notifications and alerting, flexible configuration and many more. One of the most compelling features of integrating it with Apache Solr is built-in JMX support (available only in Zabbix 2.0 beta release). Using this feature you can easily configure Zabbix server to pull JMX metrics out of any LucidWorks or Solr application. This is because all configuration settings (JMX attributes, graphs, triggers) are stored centrally on a Zabbix server, which means you can add a new attribute for all monitored servers or change the pulling frequency for servers with a single click.</p>
<p>Here are example graphs you can build in Zabbix:</p>
<p><em>1. Total number of documents in Solr index</em><br />
<a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/TotalNumberOfDocuments1.png"><img class="alignleft" style="padding-left: 0em; padding-top: 0.5em; padding-right: 0em; padding-bottom: 0.5em;" title="Total Number Of Documents" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/TotalNumberOfDocuments1.png" alt="" width="450" height="136" /></a></p>
<p><em>2. Search activity &#8211; number of search requests, errors and timeouts  </em><br />
Solr request handlers provide cumulative counter for number of requests, <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SearchActivity1.png"><img class="alignright" style="padding-left: 0.5em; padding-top: 0.5em; padding-right: 1em; padding-bottom: 0.5em;" title="Search Activity" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SearchActivity1.png" alt="" width="225" height="88" /></a> but you are probably more interested in number of search requests per specific period of time, like per minute or per second. The trick here is that Zabbix provides a way to setup monitoring to store not the value as-is, but as a delta (simple change value or speed per second).</p>
<p><em>3. Solr document operations (adds, deletes by id or query)</em><br />
<a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SolrDocumentOperations1.png"><img class="alignleft" style="padding-left: 0em; padding-top: 0.5em; padding-right: 0em; padding-bottom: 0.5em;" title="Solr Document Operations" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SolrDocumentOperations1.png" alt="" width="450" height="136" /></a></p>
<p><em>4. Crawling activity</em><br />
LucidWorks provides different connectors/crawlers which you can use to <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/CrawlingActivity1.png"><img class="alignright" style="padding-left: 0.5em; padding-top: 0.5em; padding-right: 1em; padding-bottom: 0.5em;" title="Crawling Activity" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/CrawlingActivity1.png" alt="" width="225" height="88" /></a> index documents into Solr. It also provides additional statistics about crawler behavior, like total number of documents, new and deleted documents, number of updated documents in iterative crawl, failures, etc.</p>
<p><em>5. Solr index operations (commits, optimizes, rollbacks)</em><br />
<a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SolrIndexOperations1.png"><img class="alignleft" style="padding-left: 0em; padding-top: 0.5em; padding-right: 0em; padding-bottom: 0.5em;" title="Solr Index Operations" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SolrIndexOperations1.png" alt="" width="450" height="136" /></a></p>
<p><em>6. Search Average Response Time</em><br />
<a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SearchAverageResponseTime.png"><img class="alignright" style="padding-left: 0.5em; padding-top: 0.5em; padding-right: 1em; padding-bottom: 0.5em;" title="Search Average Response Time" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SearchAverageResponseTime.png" alt="" width="225" height="88" /></a>Solr search request handler provides cumulative avgTimePerRequest value. The problem with this attribute is that  when your applications is running in production for a significant amount of time, current short term performance problems won&#8217;t cause significant effect on this aggregate metric. The solution is to use a Zabbix <em>calculated</em> item on delta change for <em>totalTime</em> and <em>requests</em> attributes. Here&#8217;s math expression to calculate average search response time for the last 5 minutes:</p>
<pre>sum("jmx[\"solr/collection1:type=/lucid,id=org.apache.solr.handler.StandardRequestHandler\",\"totalTime\"]",300)/sum("jmx[\"solr/collection1:type=/lucid,id=org.apache.solr.handler.StandardRequestHandler\",\"requests\"]",300)</pre>
<p>&nbsp;</p>
<p><em>7. Solr searcher warmup time</em><br />
This is an important metric <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SearcherWarmupTime1.png"><img class="alignright" style="padding-left: 0.5em; padding-top: 0.5em; padding-right: 1em; padding-bottom: 0.5em;" title="Searcher Warmup Time" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/SearcherWarmupTime1.png" alt="" width="225" height="88" /></a>if you pursue fast commit rate (near real time indexing) and don&#8217;t want to sacrifice fast faceting performance. You can configure monitoring tool to send alert in case of warmup time exceeds some pre-defined threshold.</p>
<p><em>8. Filter, query results and documents caches statistics (cache size, hits, hitratio, evictions, etc)</em></p>
<p class="alignleft"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/FilterCacheSize1.png"><img style="padding-left: 0em; padding-top: 0.5em; padding-right: 0.5em; padding-bottom: 0.5em;" title="Filter Cache Size" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/FilterCacheSize1.png" alt="" width="122" height="44" /></a> <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/FilterCacheHitRatio1.png"><img style="padding-left: 0em; padding-top: 0.5em; padding-right: 0.5em; padding-bottom: 0.5em;" title="Filter Cache Hit Ratio" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/FilterCacheHitRatio1.png" alt="" width="122" height="44" /></a> <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/DocumentCacheSize1.png"><img style="padding-left: 0em; padding-top: 0.5em; padding-right: 0.5em; padding-bottom: 0.5em;" title="Document Cache Size" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/DocumentCacheSize1.png" alt="" width="122" height="44" /></a> <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/DocumentCacheHitRatio1.png"><img style="padding-left: 0em; padding-top: 0.5em; padding-right: 0.5em; padding-bottom: 0.5em;" title="Document Cache Hit Ratio" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/DocumentCacheHitRatio1.png" alt="" width="122" height="44" /></a> <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/QueryResultCacheSize.png"><img style="padding-left: 0em; padding-top: 0.5em; padding-right: 0.5em; padding-bottom: 0.5em;" title="Query Result Cache Size" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/QueryResultCacheSize.png" alt="" width="122" height="44" /></a> <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/QueryResultCacheHitRatio.png"><img style="padding-left: 0em; padding-top: 0.5em; padding-right: 0.5em; padding-bottom: 0.5em;" title="Document Cache Hit Ratio" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/QueryResultCacheHitRatio.png" alt="" width="122" height="44" /></a></p>
<p><em>9.  Java Heap Memory Usage</em><br />
<a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/HeapMemoryUsage1.png"><img class="alignleft" style="padding-left: 0em; padding-top: 0.5em; padding-right: 0em; padding-bottom: 0.5em;" title="Heap Memory Usage" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/09/HeapMemoryUsage1.png" alt="" width="450" height="136" /></a></p>
<p><strong>How would I know if my search server is down?</strong></p>
<p>There are two options &#8211; the obvious one is to set up your monitoring tool to issue search requests and verify response status or specific text on a search results page. Another option is to check the last time your monitoring tool retrieved an arbitrary JMX attribute from your application and assume the server is down if it&#8217;s longer than expected. In Zabbix there&#8217;s special function <a href="http://www.zabbix.com/documentation/1.8/manual/config/triggers#example_8">nodata</a> which you can use to achieve that.</p>
<p><strong>How would I know if I&#8217;m reaching a limits of my server and pro-actively react on this?</strong></p>
<p>This is a complex issue as there are many things that can go wrong (such as JVM heap memory, CPU load, disk space, file descriptors, etc.) and you should monitor them all. Zabbix has great example templates for OS and Java triggers that allow you to keep an eye on all those parameters.</p>
<p>For more information about Solr and LucidWorks JMX support, instructions how to configure Zabbix and Nagios, Zabbix configuration templates and other helpful tips please see the <a href="http://lucidworks.lucidimagination.com/display/lweug/Integrating+Monitoring+Services">Integrating Monitoring Services</a> section on Lucid documentation portal.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/10/02/monitoring-apache-solr-and-lucidworks-with-zabbix/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Announcing LucidWorks Enterprise 1.7 General Availability</title>
		<link>http://www.lucidimagination.com/blog/2011/04/06/announcing-lucidworks-enterprise-1-7-general-availability/</link>
		<comments>http://www.lucidimagination.com/blog/2011/04/06/announcing-lucidworks-enterprise-1-7-general-availability/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 16:59:10 +0000</pubDate>
		<dc:creator>Sarath Jarugula</dc:creator>
				<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3307</guid>
		<description><![CDATA[<p>LucidWorks Enterprise 1.7 release is officially announced today. You can find the announcement <a href="http://www.lucidimagination.com/About/Company-News/Lucid-Imagination-Shapes-Future-Enterprise-Search-LucidWorks-Enterprise-17">here</a>. LucidWorks Enterprise 1.7 is a quick successor to LucidWorks Enterprise 1.6, released in Dec 2010. This release builds on our promise to enable enterprises to build search applications easily with Solr, the world’s leading open source search. You can download the LucidWorks Enterprise 1.7 <a href="http://www.lucidimagination.com/lwe/download">here</a>.</p>
<p>Our main objective with the 1.7 release is to provide you with improved enterprise &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>LucidWorks Enterprise 1.7 release is officially announced today. You can find the announcement <a href="http://www.lucidimagination.com/About/Company-News/Lucid-Imagination-Shapes-Future-Enterprise-Search-LucidWorks-Enterprise-17">here</a>. LucidWorks Enterprise 1.7 is a quick successor to LucidWorks Enterprise 1.6, released in Dec 2010. This release builds on our promise to enable enterprises to build search applications easily with Solr, the world’s leading open source search. You can download the LucidWorks Enterprise 1.7 <a href="http://www.lucidimagination.com/lwe/download">here</a>.</p>
<p>Our main objective with the 1.7 release is to provide you with improved enterprise readiness as well as core search enhancements, in particular <strong>Search enhancements from Solr 4.x trunk</strong></p>
<p>The rapid rate of innovation in the Solr/Lucene development community is a double-edged sword. The Apache Solr/Lucene 3.1 release just came out last week. However there are several bug fixes, performance improvements, and features already contributed by the Solr community in 4.x trunk. It is hard to keep up with the nightly builds to take advantage of the community contributions. Our objective at Lucid is to give you more frequent and certified versions of search functionality. LucidWorks Enterprise 1.7 contains a fully tested and certified version of Solr 4.x branch. You no longer have to try his at home, we do it for you. You can build your search applications leveraging some of the latest Solr search improvements with LucidWorks Enterprise.</p>
<p>Other highlights of this release include:</p>
<h2>New in LucidWorks Enterprise 1.7</h2>
<ul>
<li><strong>Search result grouping / Field Collapsing </strong>improves user experience and simplifies development by grouping multiple similar results as a single entry ; for example as products within a price range, multiple emails in a single threaded conversation, parts within a category. Also provides the ability to group by query &#8211; retrieves the top documents that match the query, not just the count additional info  (<a href="../2010/09/16/2446/">http://www.lucidimagination.com/blog/2010/09/16/2446/</a>)</li>
<li><strong>Integration with UIMA for metadata extraction</strong> improves versatility of metadata driven analytics and operations, such as using Pivot Faceting. It treats each facet as a dimension, creating facet counts for a multi-dimensional matrix. For example, for each category in a story inventory, it can show how many products are in stock within the facets controls, rather than just documents retrieved. (<a href="http://wiki.apache.org/solr/SolrUIMA">http://wiki.apache.org/solr/SolrUIMA</a>)</li>
<li><strong>Sort result sets by functions</strong>, so you can tweak the search relevancy result set using functions to create relevancy algorithms within the result of a query, eliminating the need to write custom rankers for common scenarios. (<a href="http://yonik.wordpress.com/2011/03/10/solr-relevancy-function-queries/">http://yonik.wordpress.com/2011/03/10/solr-relevancy-function-queries/</a>)</li>
<li><strong>Spell check against the existing index</strong> improves efficiency by saving the need to build a special dictionary for the spellcheck component, as it automatically checks entries without having to create explicit misspelled entries</li>
<li><strong>Numeric range facets (similar to date faceting)</strong> organize and analyze data and content based on quantitative parameters, such as size, dimensions, prices, etc.</li>
<li><strong>New spatial search, including spatial filtering, boosting and sorting capabilities</strong> better integrate  location-based information into search applications</li>
<li>A new <strong>Auto Suggest component</strong> simplifies rapid retrieval of candidate results interactively as the user types</li>
<li><strong>Fine-grained control of data acquisition timing and sources,</strong> both for application development and ongoing data-management and indexing for production search applications, simplifying integrated crawling configuration and management</li>
<li><strong>SharePoint CMS Connector</strong> A new connector for <strong>Sharepoint</strong>, which lets you crawl and index the content of your Sharepoint server as easily as other conventional content management systems, web servers, and databases, right from the LucidWorks Enterprise console (for Microsoft Office SharePoint Server 2007, Microsoft Windows SharePoint Services 3.0, SharePoint 2010); it also seamlessly integrates sharepoint document and user level ACLs  directly from the LucidWorks Enterprise UI.</li>
<li><strong>Security Enhancements</strong> LucidWorks Enterprise 1.7 improves integration and validation of security into search application development and deployment, with streamlined control of user privileges, groups, and LDAP integration. LDAP configuration activation, configuration and validation can be done via UI, in addition to ReST API configuration to simplify development and streamline production integration.</li>
</ul>
<p>LucidWorks Enterprise is free to download for unlimited development and test use; production deployment requires an <a href="http://www.lucidimagination.com/lwe/subscriptions-and-pricing">enterprise subscription</a>. Subscriptions include production deployment support, business hours or 24/7 SLA-based incident management support and ExpertLink on-demand consulting. The upgrade is free for current customers.</p>
<p>You can download the product <a href="http://www.lucidimagination.com/lwe/download">here</a>. For the full set of functionality refer the documentation <a href="http://lucidworks.lucidimagination.com/">here</a>. Of course, we welcome your inputs &#8212; help us improve the product by providing your feedback and suggestions in our <a href="http://www.lucidimagination.com/forum/">Forums</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/04/06/announcing-lucidworks-enterprise-1-7-general-availability/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Implementing the Ecommerce Checklist with Apache Solr and LucidWorks</title>
		<link>http://www.lucidimagination.com/blog/2011/01/25/implementing-the-ecommerce-checklist-with-apache-solr-and-lucidworks/</link>
		<comments>http://www.lucidimagination.com/blog/2011/01/25/implementing-the-ecommerce-checklist-with-apache-solr-and-lucidworks/#comments</comments>
		<pubDate>Tue, 25 Jan 2011 15:24:12 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[ecommerce]]></category>
		<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Grant Ingersoll]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2351</guid>
		<description><![CDATA[<h1>Introduction</h1>
<p>During a past <a href="http://www.lucidimagination.com/blog/2010/04/06/webinar-e-commerce/">ecommerce webinar</a> with Brian Doll of <a href="http://www.sheetmusicplus.com">Sheetmusicplus.com</a>,  I posted a checklist of items that are commonly occurring in many  ecommerce applications and then I waved my hands, due to time  constraints, and said Solr (and now <a href="http://www.lucidimagination.com/lwe/download">LucidWorks</a>) can do almost all of them out of the box and  left the rest as an exercise for the reader.  (Note, the slides are  available <a href="http://www.lucidimagination.com/files/file/Lucid-Sheetmusic-Solr-ECommercePerformance.pdf">here</a>.   Registration required.)  Well, now I &#8230;</p>]]></description>
			<content:encoded><![CDATA[<h1>Introduction</h1>
<p>During a past <a href="http://www.lucidimagination.com/blog/2010/04/06/webinar-e-commerce/">ecommerce webinar</a> with Brian Doll of <a href="http://www.sheetmusicplus.com">Sheetmusicplus.com</a>,  I posted a checklist of items that are commonly occurring in many  ecommerce applications and then I waved my hands, due to time  constraints, and said Solr (and now <a href="http://www.lucidimagination.com/lwe/download">LucidWorks</a>) can do almost all of them out of the box and  left the rest as an exercise for the reader.  (Note, the slides are  available <a href="http://www.lucidimagination.com/files/file/Lucid-Sheetmusic-Solr-ECommercePerformance.pdf">here</a>.   Registration required.)  Well, now I have some time, so let me fill in  the blanks with some more concrete examples about how to do this.</p>
<h1>Setup</h1>
<p>For this example, I am using real estate data freely available from the <a href="http://www.nyc.gov/html/dof/html/property/property_val_sales.shtml">NYC government</a>.  The reason I am interested in this data is that it is:</p>
<ol>
<li>Free.</li>
<li>It has product-like data in it, as in: name, description, a bunch of metadata and price</li>
<li>It&#8217;s mostly real (I embellished it with descriptions and a few other  pieces and filled in some missing pieces of data, see the Indexer class  in the source code.)  In fact, it&#8217;s so real, that when setting up the  app, one quickly sees how noisy the data is in terms of things like  missing values, etc.  For instance, 1804 records don&#8217;t have the year  built specified.</li>
</ol>
<p>I have setup a Solr schema for this data as well as some tools for indexing the data.    To run the demo, you will need:</p>
<ol>
<li>Java 1.6</li>
<li>Ant 1.7.X</li>
<li>Download <a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/01/ecommerce.zip">ecommerce.zip</a> (73 MB)</li>
</ol>
<p>Once you have the prerequisites in place, take the following steps:</p>
<ol>
<li>Unzip the ecommerce.zip file into the directory of your choice</li>
<li>cd lucid_ecom</li>
<li>In a separate terminal window: cd solr
<ol>
<li>java -jar start.jar (just as if you were running the Solr tutorial.   Note, I am running a relatively recent version of the Solr 3.x branch)</li>
</ol>
</li>
<li>Point your web browser at http://localhost:8983/solr/nyc and take a moment to familiarize yourself with the interface.</li>
</ol>
<p>Once you have completed step 4, you should see something like:</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/04/lucid_real_estate.png"></a><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/04/lucid_real_estate.png"><img class="size-full wp-image-2299 alignnone" title="Lucid Real Estate Screenshot" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/04/lucid_real_estate.png" alt="Lucid Real Estate Screenshot" width="936" height="430" /></a></p>
<p>(NOTE: I&#8217;m not a graphic designer.  I tried to create a reasonable UI  w/o spending a ton of time on every last piece of it.  Also, I used the <a href="http://wiki.apache.org/solr/VelocityResponseWriter"> VelocityResponseWriter</a> built into Solr.  It&#8217;s nice for prototyping, but  it &#8220;ain&#8217;t&#8221; for production use.)</p>
<p>A pre-built index is included in the Zip file, but if you wish to index it yourself, run:</p>
<ol>
<li>ant delete-all (deletes the existing content)</li>
<li>ant index</li>
</ol>
<p>With the working application in place, let&#8217;s take a look at how to implement the various checklist items.</p>
<p><!--StartFragment--></p>
<h1>Implementing the Checklist</h1>
<p>I&#8217;ve broken out each checklist item below and will cover each of them in more detail in the following subsections.</p>
<h2>Keyword search</h2>
<p>There really isn&#8217;t much to be said here other than Solr has built in  support for querying in all the &#8220;usual&#8221; ways that one would expect out  of a search engine.  Keywords, phrases, wildcards, fielded search and  much, much more.  For example, try:</p>
<ol>
<li><a href="http://localhost:8983/solr/nyc?q=tottenville">http://localhost:8983/solr/nyc?q=tottenville</a> or just type tottenville in the search box.</li>
<li><a href="http://localhost:8983/solr/nyc?q=5+bedrooms+%22Staten+Island%22">http://localhost:8983/solr/nyc?q=5+bedrooms+%22Staten+Island%22</a> (5 bedrooms &#8220;Staten Island&#8221;)</li>
<li><a href="http://localhost:8983/solr/nyc?q=5+bedrooms+borough_display%3ABro*">http://localhost:8983/solr/nyc?q=5+bedrooms+borough_display%3ABro*</a> (5 bedrooms borough_display:Bro* &#8212; Should match all 5 bedrooms in either the Bronx or Brooklyn)</li>
</ol>
<p>Take some time and try out your own queries.  In our example, we are using the <a href="http://wiki.apache.org/solr/DisMaxQParserPlugin">extended Dismax Query Parser</a>, in case you want to learn more about how it works.</p>
<h2>High Quality relevance (precision @ &lt; 10)</h2>
<p>In many search applications, and ecommerce is no exception, users  often abandon searches when the first page of results (often the top 10)  are not relevant to their query.  Thus, it is important that a search  engine return good results on the first page.  While some guidance (more  on this in the coming sections) can help alleviate the abandonment  problem, a strong first showing is often the quickest way to more  clickthroughs.  Since Solr utilizes Lucene, which implements an industry  standard vector space approach to search, results are often quite good  out of the box.  Nevertheless, many ecommerce applications may need one  or more of the tools that Solr/Lucene provide out of the box to tweak  relevance, such as:</p>
<ol>
<li>Document, field, token boosting (i.e. matches in the title field are more important than matches in the description.)</li>
<li>Query term boosting (provide weights for different terms, such as synonyms.)</li>
<li>Disjunction Maximum Query scoring (aka the &#8220;dismax&#8221; parser or the extended dismax parser) for dealing with cross field matches.</li>
<li>Automatic phrase generation from multiword queries even when the user did not explicitly quote the keywords.</li>
<li>The ability to override low-level scoring information such as term  frequency, document frequency, document length normalization and  coordination factors.</li>
<li>Function queries (more later) to allow values in fields (such as price) to be factors in scoring.</li>
<li>Editorial Boosting/Sponsored Results (in Solr-speak it&#8217;s called the  QueryElevationComponent &#8212; more later) to place specific results at the  top.</li>
</ol>
<p>Relevance tuning is a complex subject and one that is best viewed in  the light of your data.  In summary, make sure you are making decisions  about relevancy based on the big picture and try to avoid any local  minima (i.e. tuning a specific query to the detriment of breaking lots  of other queries.)  In other words, make sure your top money making  queries aren&#8217;t effected by you &#8220;fixing&#8221; a one or two bad queries.  To  learn more, see my articles on <a href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Optimizing-Findability-Lucene-and-Solr">Improving Findability</a> and <a href="http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwww.lucidimagination.com%2FCommunity%2FHear-from-the-Experts%2FArticles%2FDebugging-Relevance-Issues-Search">Debugging Relevance</a>.  With the basics out of the way, it&#8217;s time to take a look at faceting and discovery tools</p>
<h2>Faceting/Discovery</h2>
<p>One of Solr&#8217;s most appealing features is its out of the box support  for faceting (sometimes called navigators, parametric search, guided  navigation) in a number of different ways (see <a href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr">http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr</a> for a primer.  Also see <a href="http://wiki.apache.org/solr/SimpleFacetParameters">http://wiki.apache.org/solr/SimpleFacetParameters</a>)   In the example application, the left hand nav area shows facets for  things like borough (field based faceting), sale price (numeric range  faceting), sale date (date range faceting) and pet friendly (facet by  query).   Solr also supports &#8220;multi-select&#8221; faceting (see <a href="http://search.lucidimagination.com">http://search.lucidimagination.com</a> for an example.)  And, while there isn&#8217;t support for true hierarchical  faceting in Solr yet, there are ways to achieve it through intelligent  modeling of your tokens.  Last, but not least, you may find <a href="https://issues.apache.org/jira/browse/SOLR-792">https://issues.apache.org/jira/browse/SOLR-792</a> useful for doing grouped faceting (color: red, size: large).</p>
<p>Additionally, helping customers discover items of interest goes well  beyond facets.  Features like Did you mean, Related Items/Searches,  Collaborative Filtering/Recommenders (see Mahout for an open source  solution), Auto Suggest and others can go a long way in increasing the  user&#8217;s ability to purchase items from your store.  Many of these  features I&#8217;ll cover below.</p>
<h2>Flexible language analysis tools</h2>
<div>Lucene and Solr have an extensive, open language analysis framework  that makes it easy to do linguistic analysis.  I won&#8217;t spend too much  time here, as you can have a look at the included schema.xml for  information on the various pieces I used.  Also, have a look at the <a href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters">Solr wiki</a> for more info.  Suffice it to say, Solr has many tokenizers, stemmers  and other token modification capabilities.  In many cases, a good search  system will use a variety of techniques (case changes, stemming,  synonyms, etc.) to achieve the desired results.  It is also often useful  to build up a list of protected words for things like product names so  that they do not get confused with other words that share a common  root.  Finally, keep in mind that of all the extension points to Lucene  and Solr, writing your own TokenFilter is one of the easiest things you  can do to extend the capabilities of your application.</div>
<h2>Multilingual support</h2>
<p>Solr contains support for most of the commonly spoken languages in  the world, including English, Chinese, French, Spanish, Korean, German,  Thai and many more.  Lucene and Solr are also UNICODE compliant.</p>
<h2>Frequent Incremental Updates</h2>
<p>Lucene, and thus, Solr has supported incremental updates from it&#8217;s  inception without the need to re-index the whole collection.  It is also  very fast at making new documents available for search.  Additionally,  with the combination of recent and upcoming work in Lucene, real time  search should be available soon.  The one piece that is still missing is  individual field update, but for certain types of fields (ratings, for  instance), there may be easy workarounds.</p>
<h2>Ratings and Reviews</h2>
<p>In working with many ecommerce customers on Solr, there are usually  questions around how to incorporate ratings and reviews into search  results without skewing results or introducing too much noise.   On the  ratings side, app developers often want to incorporate the aggregate  rating of an item as a boost factor in the overall score.  I will  discuss how to do this in detail in the section titled Editorial  Relevance Controls below.  Meanwhile, on the review side, it is often  the case that too much noise is introduced by including reviews &#8220;on par&#8221;  with matches in the product title or description.  For instance, if I&#8217;m  selling &#8220;Widget X&#8221; and a review for a different product says something  like &#8220;You should also check out Widget X&#8221;, bringing back a match on that  second product really isn&#8217;t all that useful for a customer searching  for &#8220;Widget X&#8221;.   To deal with this noise, people often take a couple of  different approaches:</p>
<ol>
<li>They weight review matches lower than product matches via boosting (either at query time or indexing time)</li>
<li>They only search reviews if they don&#8217;t feel they have high quality matches for the main product search</li>
</ol>
<p>You could also do some type of post processing analysis (NLP) of the   review to see if it is on topic, but this approach likely isn&#8217;t viable  for  most people in most situations due to the processing power and  accuracy  of such a solution.  As for #2 above, see my post on <a href="http://www.lucidimagination.com/blog/2009/08/12/fake-and-invisible-queries/">Fake and Invisible Queries</a> for more insight.</p>
<h2><!--StartFragment--></h2>
<h2>Auto-suggest</h2>
<p>Auto suggest (aka auto complete) is one of the cheapest (in terms of  development costs) mechanisms available for enhancing the chance that  users find what they are looking for.  I&#8217;ve heard of vendors adding  auto-suggest and having it add millions to their bottom line.  Simply by  providing a drop down list of ways of completing what a user has typed  so far an application can do a number of things:</p>
<ol>
<li>Reduce spelling errors thus leading to lower frustration and better results sooner rather than later</li>
<li>Seed the user with items that they may want but weren&#8217;t explicitly  looking for.  After all, an intelligent auto-suggest box can very easily  not only give completions, but it can also hook in related items too.</li>
<li>Short-circuit search all together and go directly to a landing page for a specific search</li>
</ol>
<div>
<dl id="attachment_2317">
<dt><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/04/ecomm-sample-auto-suggest.png"><img title="Example Auto-Suggest screen" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/04/ecomm-sample-auto-suggest.png" alt="" width="429" height="340" /></a></dt>
<dd>Example Auto-Suggest Screen Capture</dd>
</dl>
</div>
<p>For the demo, I implemented auto-suggest using SOLR-1316, which  should be committed to trunk soon.  Note, also, there are other ways of  doing auto-suggest, too, including using the TermsComponent and  Faceting.  Here are the steps I went through to make auto-suggest work:</p>
<ol>
<li>Applied the SOLR-1316 patch to the 3.x branch.  This required a  minor tweak to the HighFreqDictionary.java file.  See patch below</li>
<li>Add the necessary piece to the solrconfig.xml.  See the /autosuggest SearchComponent in the solrconfig.xml in the appendix.</li>
<li>Decide what fields to use in building the auto-suggest index (see  schema.xml).  I then &#8220;copy fielded&#8221; these into a field named suggest.   Note that I used a non-stemming analyzer.  I also used Solr&#8217;s word-based  n-gram filter with a shingle base of 5 so as to give phrase suggestions  too.  Note, this is intended for demonstration purposes, as you may  wish to not use shingles and append terms as the user types or you may  want to use a different value for n.  Also note, I did not spend much  time at all on evaluating what went into the suggest field that is used  as a source.  You will want to validate it and make sure it is aligned  with your business goals.</li>
<li>Build the auto-suggest data structures via the Spell Checker build command (see the next section)</li>
<li>Modified the jQuery script that is in the Solr  VelocityResponseWriter example to use the SOLR-1316 output instead of  the TermsComponent output.  See the autocomplete.vm file for details on  the Javascript.  See the next section on Did You Mean on how to make  requests to the the auto-suggest component, as it uses the same  mechanism as the spell checker.</li>
</ol>
<p>Hopefully, from here you will have enough information to build you your auto-suggest capabilities.  If not, see our <a href="http://www.lucidimagination.com/search/?q=autosuggest">search site</a> for more info, including alternate approaches to the SOLR-1316 patch.</p>
<h2>Did You Mean?</h2>
<p>Just like auto-suggest, spell checking can be helpful to users in  finding what they are looking for, especially given the propensity of  manufacturers/product designers to use incorrectly spelled words in  their product name in order to better &#8220;brand&#8221; the product.  Good spell  checking goes beyond merely hooking up a dictionary of terms, it is also  quite important to know when to suggest a term and when not suggest a  term.  Lucene/Solr has the basics of setting up spell checking covered  via the SpellCheckComponent, but a good spell checking application will  need to go beyond merely setting up the component in order to achieve  good results.  First things first, however, let&#8217;s take a look at getting  spell checking setup and then we can examine what is needed to make it  better.</p>
<p>First, we need to configure the SpellCheckComponent in the  solrconfig.xml file.  There is an example of this in the Solr tutorial  example, from which I changed the distance measure from the Levenstein  edit distance to the Jaro-Winkler distance.  The reason I did this is  based on past experience that users tend to misspell words towards the  end of the word and not the beginning, which the Jaro-Winkler distance  accounts for.  My configuration looks like:</p>
<blockquote>
<pre>&lt;searchComponent name="spellcheck"&gt;
 &lt;str name="queryAnalyzerFieldType"&gt;textSpell&lt;/str&gt;
 &lt;lst name="spellchecker"&gt;
 &lt;str name="name"&gt;default&lt;/str&gt;
 &lt;str name="field"&gt;spell&lt;/str&gt;
 &lt;str name="spellcheckIndexDir"&gt;./spellchecker&lt;/str&gt;
 &lt;str name="distanceMeasure"&gt;org.apache.lucene.search.spell.JaroWinklerDistance&lt;/str&gt;
 &lt;/lst&gt;
&lt;!-- ... --&gt;
 &lt;/searchComponent&gt;</pre>
</blockquote>
<p>The whole point of a SearchComponent such as the SpellCheckComponent  is to hook it into the main Solr request processing instead of having to  make a separate call.  Thus, I hooked the SpellCheckComponent into the  /nyc RequestHandler so that all queries that are submitted to the &#8220;main&#8221;  RequestHandler will also be spell checked.  Once the configuration is  setup, the spelling index must be built (and maintained.)  This is  handled by issuing an &amp;spellcheck.build=true command to the spell  checker, as in:</p>
<blockquote><p><a href="http://localhost:8983/solr/autosuggest?q=man&amp;spellcheck=true&amp;wt=xml&amp;rows=0&amp;indent=true&amp;spellcheck.build=true">http://localhost:8983/solr/autosuggest?q=man&amp;spellcheck=true&amp;wt=xml&amp;rows=0&amp;indent=true&amp;spellcheck.build=true</a></p></blockquote>
<p>(Note, the &amp;q param can be anything.)</p>
<p>Once the configuration is hooked up and the spell checking data  structure is built, the last piece is to hook it into the UI.  (Note, I  setup the solrconfig.xml to automatically do spell checking on every  query request.)  To hook into the UI, I co-opted the suggest.vm file and  spruced it up a bit to provide links, etc.  Other than that, it is  exactly the same, since both are just different implementations of spell  checking.</p>
<p>See the Solr wiki on the <a href="http://wiki.apache.org/solr/SpellCheckComponent">SpellCheckComponent</a> for more information.</p>
<h2>Related Searches/Items</h2>
<p>In many ecommerce applications, stores position related items next to  a particular item so as to inspire the user to either buy an additional  item or offer an alternative.  Naturally, the &#8220;relation&#8221; is determined  by the store and might take on a variety of forms, such as: accessories,  enhanced versions, cheaper versions, alternatives from different  manufacturers or popular items based on other users.  Similarly, a store  may wish to give users not only suggestions and spelling corrections,  but they may also want to give users alternative search terms or other  popular searches.  For instance, if a user searches for TVs, a store may  want to suggest they search for &#8220;LCD TVs&#8221; or &#8220;HD TVs&#8221;, etc.</p>
<p>When it comes to related items, many Solr users rely on either  hand-crafting a second query (given an original query and a particular  item) by using the original terms of the query and some of the terms  that describe the item.  For instance, an application might use the  category of the item plus some of the keywords for that item to then  craft the query, submit it to Solr and then display the first few  results.  This approach can also be done automatically using Solr&#8217;s  built in <a href="http://wiki.apache.org/solr/MoreLikeThis">More Like This</a> (MLT) capability, but you may need to do some tuning to get the results  you desire.  For the sake of the example, I incorporated MLT into the  application.  You can see it on the left hand side, just below the map,  under the &#8220;Similar Properties&#8221; heading.  The configuration of MLT was  done in the solrconfig.xml file as part of the /nyc RequestHandler.   Note, in a typical application you may not wish to generate MLT results  for a search query, but instead only provide them once a user chooses a  particular document, as MLT can add a fair amount of overhead to the  process.  Other Solr applications will often calculate related items off  line or through some type of collaborative filtering approach (see  Apache <a href="https://cwiki.apache.org/confluence/display/MAHOUT/Recommender+Documentation">Mahout&#8217;s recommender capability</a> for an open source library to do this) and either add the information  to the document and re-index or integrate it at the application level.   In these cases, it&#8217;s not hard to integrate, but it is beyond the scope  of this article.</p>
<p>As for the functionality to add related searches, there is not currently support built into Solr, but there is a <a href="https://issues.apache.org/jira/browse/SOLR-2080">JIRA issue</a> open to track the idea.  Related searches can often be determined  through a combination of log analysis (look for patterns in a user  session) and synonyms or via collaborative filtering/recommenders.   Also, have a look at <a href="https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining">Mahout&#8217;s Frequent Pattern Mining capabilities</a>.  One could also index the queries into another index (Solr core) and simply issue fuzzy queries to it.</p>
<h2>Editorial Relevance Controls</h2>
<div>Whether its called &#8220;editorial controls&#8221;, &#8220;sponsored results&#8221;, &#8220;best  bets&#8221; or any other name, the ability to implement business goals as  part of search is a fundamental need of any ecommerce solution.  Hidden  in the various names is a desire to have total control of search  relevance without sacrificing speed or hindering the engine from working  well when no business rules are applicable.  Solr and Lucene offer a  myriad of mechanisms to achieve business goals ranging from the typical  boost values on documents, fields, tokens and query terms to the  hardcore &#8220;gotta have it exactly my way&#8221; option of cracking open the  source and adding your own query mechanism.  In between these two  extremes are a whole range of things like <a href="http://wiki.apache.org/solr/FunctionQuery">function queries</a>, payloads, the <a href="http://wiki.apache.org/solr/QueryElevationComponent">QueryElevationComponent</a> for setting fixed results as well as excluding specific documents, <a href="http://wiki.apache.org/solr/SolrPlugins#Similarity">similarity adjustments</a>, <a href="http://wiki.apache.org/solr/DisMaxQParserPlugin">augmented queries (such as automatic phrase boosting)</a> and much more.  Of these, most people rely on function queries, the  dismax extensions and the QueryElevationComponent to achieve their  relevance goals.</div>
<div>In the working example, I made a couple of changes to demonstrate some of the relevance ideas described here:</div>
<div>
<ol>
<li>The /nyc RequestHandler has the QueryElevationComponent hooked in  and keyed off of the elevate.xml file.  In that file, I mapped the query  &#8220;3 bedroom Brooklyn&#8221; to rank a specific document higher and exclude one  other.  See <a href="http://localhost:8983/solr/admin/file/?file=elevate.xml">http://localhost:8983/solr/admin/file/?file=elevate.xml</a> for the mapping.  To see this, add &amp;enableElevation=false to the query, as in: <a href="http://localhost:8983/solr/nyc?q=3+bedroom+Brooklyn&amp;enableElevation=false">http://localhost:8983/solr/nyc?q=3+bedroom+Brooklyn&amp;enableElevation=false</a></li>
<li>I setup &#8220;phrase boosting&#8221; on the description field to generate  phrases against the description field.  See the /nyc RequestHandler  (it&#8217;s the &#8220;pf&#8221; setting&#8221; in the solrconfig.xml).</li>
<li>I added a &#8220;boost function&#8221; to rank documents higher based on the  commission paid for selling the property (note, I randomly assigned a  value to this field for pedagogical reasons).  See the &#8220;bf&#8221; setting in  the /nyc RequestHandler.</li>
<li>Also, don&#8217;t forget creative domain modeling:  for instance, if you want to support landing pages and banners, why not just create them as documents in your index (assign a type to them) and make sure they are at the top of the results (other possibilities include doing two queries, one for landing pages first and then one for the results)</li>
</ol>
</div>
<div>If you are so inclined, you can also extend Solr and Lucene.  Before you do, however, you might want to <a href="http://search.lucidimagination.com">search for you issue</a>, or even ask on the appropriate mailing list.  If that doesn&#8217;t help, I recommend starting with the <a href="http://wiki.apache.org/solr/SolrPlugins">Solr Plugins</a> wiki page and then you can dig into the source from there if  necessary.  My advice:  If you think you need a new Query class (a  low-level Lucene mechanism for custom scoring), see if you can solve  your problem via a FunctionQuery (even a custom one) first and maybe  some other mechanisms before going down the Query path.</div>
<h2>Administration</h2>
<p>Administration means many things to many people.  To the IT  department, it means easy setup, configuration, monitoring, maintenance,  scalability, fault tolerance, etc. while to the business user it means  tools for manipulating results, reporting search statistics and  following through on business goals.  While the latter is important, I  am going to focus on the IT dept. needs for the sake of this article.   Solr is very easy for an IT person to get setup and have a baseline  configuration in place.   I&#8217;ve seen customers (without my help) be up  and running and searching their data in non-trivial ways in as little as  30 minutes, sometimes less.  As for monitoring, Solr comes with web  pages that report status as well as JMX integration.  I&#8217;ve also seen  Solr integrated nicely with Nagios, Cactus and other tools.  Lucid  Imagination also partners with <a href="http://www.lucidimagination.com/performanceportal">New Relic</a> to offer Solr specific monitoring tools.</p>
<p>As for the big questions about scalability and fault tolerance, the answer is an unequivocal yes.  High traffic ecommerce sites like Zappos, Netflix, CNET, AOL and many others use Solr  to server their search needs.  Solr can be setup to both handle large  indexes and high query volumes.  For more information on how to do this,  see Mark Miller&#8217;s excellent article on <a href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr">scaling Solr</a>.</p>
<h2>Recommendations (See Mahout)</h2>
<p>For both online and offline recommendation calculations, see the <a href="http://mahout.apache.org">Apache Mahout</a> project, which has an excellent collaborative filtering library.   While integration with Solr does not yet exist, Mahout does expose web  services (as well as Java APIs) for its recommender engine, so it is  feasible to integrate it within an application.</p>
<h2>Analytics and other Business Tools</h2>
<p>Analytics are probably  Solr&#8217;s weakest area, but that being said, we find that many customers  already have platforms in place (like Omniture) that they can easily  integrate Solr into.  This often saves business users from having to  learn yet another tool.  As for other business tools, Solr likely does  not have them (for instance, merchandising tools), but again, many  people find it straightforward to integrate Solr into existing tools.  Also, this is an area that LucidWorks, with it&#8217;s administrative UI really can help.  It has screens and tools for doing log analysis and seeing what popular queries are, as well as popular terms and zero result values.</p>
<h1>Looking Forward</h1>
<p>Solr is a very popular and capable search engine for ecommerce and, looking forward,  it is only getting better.  With a focus on greater features (spatial  search, for instance), the latest Lucene and easier scalability, the  next version of Solr promises to be even better.</p>
<h1>Appendix A</h1>
<p>Items needed here: schema, solrconfig, SOLR-1316 3.x branch patch</p>
<p><!--EndFragment--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/01/25/implementing-the-ecommerce-checklist-with-apache-solr-and-lucidworks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Apache Lucene Ecosystem: My View of 2010</title>
		<link>http://www.lucidimagination.com/blog/2010/12/27/the-apache-lucene-ecosystem-my-view-of-2010/</link>
		<comments>http://www.lucidimagination.com/blog/2010/12/27/the-apache-lucene-ecosystem-my-view-of-2010/#comments</comments>
		<pubDate>Mon, 27 Dec 2010 15:54:11 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[Droids]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucene Connector Framework]]></category>
		<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Lucy]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[ManifoldCF]]></category>
		<category><![CDATA[nutch]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[PyLucene]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Tika]]></category>
		<category><![CDATA[ZooKeeper]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2809</guid>
		<description><![CDATA[<p>After a week off to enjoy time with my family, I thought I would kick off the last week of 2010 with a look back at the year as it relates to the Apache Lucene ecosystem.  For anyone who follows the amalgamation of projects that I like to call the Lucene Ecosystem (the Apache projects: Lucene, Solr, Nutch, Mahout, Tika, PyLucene, Lucy, Lucene.NET, Droids, ManifoldCF &#8212; Lucene Connector Framework, OpenNLP and UIMA) you know it &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>After a week off to enjoy time with my family, I thought I would kick off the last week of 2010 with a look back at the year as it relates to the Apache Lucene ecosystem.  For anyone who follows the amalgamation of projects that I like to call the Lucene Ecosystem (the Apache projects: Lucene, Solr, Nutch, Mahout, Tika, PyLucene, Lucy, Lucene.NET, Droids, ManifoldCF &#8212; Lucene Connector Framework, OpenNLP and UIMA) you know it has been an amazingly busy and fruitful year.  Instead of going through each project like <a href="http://www.lucidimagination.com/blog/2009/12/24/the-apache-lucene-ecosystem-my-view-of-2009/">last year&#8217;s review</a>, I&#8217;m just going to be a bit less formal and hit on the highlights as I see them.</p>
<p>Before I dig in too much, though, a special thanks to all our customers at Lucid Imagination as well as to my coworkers.  I&#8217;m coming up on 15 years out in the &#8220;real world&#8221; and I can honestly say I&#8217;ve never enjoyed what I do as much as I do here and that even accounts for the normal rough patches one goes through in any job.  As an engineer, there are few things as cool as getting to work with customers who are not only using, but pushing your work/project/product on a daily basis to do new and interesting things (I think this is a direct result of the project being Open Source, which I believe has an inherently <a href="http://www.lucidimagination.com/blog/2009/04/20/lucene-open-source-and-the-cost-of-experimentation/">lower cost of experimentation</a>).  I&#8217;ve been fortunate enough to meet and talk with many people doing all kinds of things with Lucene and Solr ranging from the &#8220;mundane&#8221; of basic keyword search to those building next generation search capabilities at incredible scale.  Through it all, I&#8217;m constantly amazed at the flexibility and efficiency of Lucene and Solr.  For instance, I&#8217;ve been working with one customer now whose Solr-based solution (for the exact same content) will use ~50% less hardware and will have an index that is 1/6 the size of their FAST index all while saving them major dinero.</p>
<p>Speaking of Lucid, one of the highlights of the year for us that relates directly to Lucene and Solr is the launch of our enterprise version: <a href="http://www.lucidimagination.com/lwe/download">LucidWorks Enterprise</a>.   I like to think of it as Apache Solr with a whole lot of Lucid expertise on how to use Solr baked in and topped off with other features and functionality to make building search applications easier.</p>
<p>OK, time to move on to the open source projects&#8230;</p>
<ol>
<li>Without a doubt, the biggest news of the year is the merging of the Lucene and Solr code base as well as the &#8220;graduation&#8221; of several subprojects to Apache Soft. Foundation Top Level Projects (TLP).  The graduating projects are <a href="http://tika.apache.org">Tika</a>, <a href="http://nutch.apache.org">Nutch</a>, and <a href="http://mahout.apache.org">Mahout</a>.  We also spun Lucy (a C port) to the Incubator, where it is working on it&#8217;s own community.  These moves were primarily done to focus the project management on single code base, but they also demonstrate the project has reached a level of maturity at the ASF.  The move also has the side benefit of bringing each project higher visibility.</li>
<li>I&#8217;m particularly excited about the addition of <a href="http://www.lucidimagination.com/blog/2010/12/02/opennlp-moving-to-apache/">OpenNLP to the Apache</a> umbrella.  OpenNLP is a nice open source Java project for natural language processing that has lived at Source Forge for quite some time.  I would expect development to grow quite a bit under the ASF community based model.  Also, integrating OpenNLP with Solr and Lucene is pretty easy to do.  I would be remiss if I didn&#8217;t also give a nod to the addition of the <a href="http://incubator.apache.org/connectors">ManifoldCF</a> project to the ASF.  ManifoldCF will help unlock content in Sharepoint, Documentum and other repositories for users of Lucene and Solr.</li>
<li>Lucene&#8217;s trunk code base now implements our &#8220;Flex APIs&#8221;, which should allow users to have near total control over what goes in the index as well as alternate compression techniques, different scoring models, etc.  See Michael McCandless&#8217; excellent <a href="http://www.lucidimagination.com/files/file/LuceneRev_McCandless_FunWithFlex.pdf">talk at Lucene Revolution</a> for more details.</li>
<li>With all the location aware devices and capabilities on the market, geo-spatial search is a hot topic and Lucene and Solr have been adding quite a bit of capabilities in this regard with the ability to filter, boost and sort results based on location information in documents.  See Solr&#8217;s <a href="http://wiki.apache.org/solr/SpatialSearch">Spatial Search Wiki page</a> for more info as well as several of my <a href="http://www.lucidimagination.com/search/?q=spatial#/s:lucid/li:blogs">past blog posts</a>.</li>
<li>Of course, everyone was a buzz about the cloud this year.  For Solr, this translates into greater efforts to make Solr easier to scale to very large installations (100s to 1000s of nodes and billions and billions of documents) via the <a href="http://wiki.apache.org/solr/SolrCloud">Solr Cloud project that Yonik Seeley and Mark Miller have been spearheading</a>.</li>
<li>On the user side, one of the biggest pieces of buzz this year related to Lucene was the migration of Twitter search to Lucene.  At 1 billion queries per day and 50 million posts per day (all indexed and searchable in near real time), Twitter&#8217;s search system certainly has it&#8217;s work cut out for itself.  However, as Michael Busch <a href="http://www.lucidimagination.com/events/revolution2010/videos/mbusch">outlined at Lucene Revolution</a>, Apache Lucene was up to the task!  Naturally, there were lots of other companies that migrated to Solr and Lucene as well.  Have you <a href="http://www.lucidimagination.com/enterprise-search-solutions/case-studies">shared your use case</a>?</li>
</ol>
<p>Well, I&#8217;ve no doubt missed a bunch of other things, but those items, to me, are some of the bigger highlights.  Looking forward, there are some other exciting things coming to Lucene and Solr.  In particular, I&#8217;m working on adding language identification, related searches and point in polygon filtering to Solr.  I would also expect we will release Lucene/Solr 3.1 fairly soon, too, but you can&#8217;t pin me down on a date just yet.</p>
<p>Here&#8217;s hoping you all have a Happy Holidays and a Happy New Year!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/12/27/the-apache-lucene-ecosystem-my-view-of-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Lucid Query Parser demo afterword</title>
		<link>http://www.lucidimagination.com/blog/2010/12/06/lucid-query-parser-demo-afterword/</link>
		<comments>http://www.lucidimagination.com/blog/2010/12/06/lucid-query-parser-demo-afterword/#comments</comments>
		<pubDate>Mon, 06 Dec 2010 22:04:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[LucidWorks]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2761</guid>
		<description><![CDATA[<p>Thank you to all who joined our Lucid Query Parser session last Wednesday. If you didn’t, catch up with the recorded session <a href="http://www.lucidimagination.com/solutions/webcasts/Exploring-the-Lucid-Query-Parser">here</a>.</p>
<p>The Lucid Query Parser is distributed as part of LucidWorks Enterprise, and it enhances your users&#8217; ability to pinpoint information to be retrieved from LucidWorks Enterprise. Some LucidWorks Enterprise developers have opted to allow their users the full power of the parser, which is a viable strategy, as the parser is &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Thank you to all who joined our Lucid Query Parser session last Wednesday. If you didn’t, catch up with the recorded session <a href="http://www.lucidimagination.com/solutions/webcasts/Exploring-the-Lucid-Query-Parser">here</a>.</p>
<p>The Lucid Query Parser is distributed as part of LucidWorks Enterprise, and it enhances your users&#8217; ability to pinpoint information to be retrieved from LucidWorks Enterprise. Some LucidWorks Enterprise developers have opted to allow their users the full power of the parser, which is a viable strategy, as the parser is completely resilient to syntax errors, and will never pop up an exception. Others have developed a pre-parser, which modifies the query, based on their application specific needs.</p>
<p>As for us, at Lucid Imagination, we opted to give our users at search.lucidimagination.com to benefit from the parser as is; looking at the query logs, we see some sophisticated queries being submitted. For those who fully utilize the parser’s capabilities, great job!</p>
<p>As always, during the demo, you guys presented me with questions about Lucid Query Parser. Here’s a recap of them.</p>
<p><em><strong>Q: Which features (if any) will now go into open source Solr development and which features will go into LucidWorks Enterprise?</strong></em><br />
A: LucidWorks Enterprise uses a full version of Solr, without any modifications or changes. In fact, it is extremely important for us at Lucid Imagination to maintain the innovation and leadership of Apache Solr. As a rule of thumb, any code that touches core Solr, is developed in Apache Solr first, before it is introduced to LucidWorks Enterprise. Such were the cases of Solr Cloud and Field Collapsing, a Lucid Imagination contribution to Solr. For a non core features, such as the admin UI and the ReST API, we are evaluating these features on a case by case basis, and might open source them as we see fit.</p>
<p><strong><em>Q: Do you have a write-up describing which features are in which layers: Lucene, Solr, LucidWorks Enterprise?<br />
</em> </strong>A: Lucene, Solr and LucidWorks Enterprise are complementary technologies that depend on each other, and offer very similar underlying capabilities. In choosing a search solution that is best suited for your requirements, key factors to consider are application scope, development environment, time to deployment factor, rate of change and growth, and software development preferences.</p>
<p>Lucene is a Java technology-based search library that offers speed, relevancy ranking, complete query capabilities, portability, scalability, and low overhead indexes and rapid incremental indexing. This is the technology at its raw form – fully exposed, tweak-able and customizable, but requires java application development to access all of the capabilities.</p>
<p>Solr is the Lucene Search Server. It presents a web service layer built atop Lucene, and extending it to provide developers with an easy to use search server. Solr brings with it operational and administrative capabilities like web services, faceting, configurable schema, caching, replication, and administrative tools for configuration, data loading, statistics, logging, cache management, and more.</p>
<p>LucidWorks Enterprise adds a management, configuration, and application integration atop of Solr, and extending it to provide application developers a ready to use search platform. LucidWorks Enterprise brings with it a management and configuration ReST API for rapid integrations, integrated crawler for data acquisition, admin and search UI, industry strength query parser, document security with connectivity to LDAP systems, and more.</p>
<p><em><strong>Q: Is the query parser part of Solr release or a feature only for LucidWorks Enterprise?</strong></em><br />
A: Yes, LucidWorks Enterprise contains some additional value add components, allowing for speedier development process, broader developer skillsets, and lower cost of growth and change adaption.</p>
<p><em><strong>Q: Can the query parser be used independently of the rest of LucidWords Enterprise?</strong></em><br />
A: Please consider the following points, before diving in to the individual LucidWorks Enterprise components – All LucidWorks Enterprise value-add components are licensed under the Lucid Imagination developer license (link HERE) allowing you to develop, test, evaluate and integrate for free, however, production use of LucidWorks Enterprise value-add components requires a subscription service with Lucid Imagination. In addition, Lucid Imagination do not recommend using the parser as a stand alone due to the strong integration points with LucidWorks Enterprise schema.</p>
<p><em><strong>Q: What are open ended range queries?</strong></em></p>
<p>A: To put the question in context, it came up during the presentation of range queries. Range queries can be enclosed with curly brackets, or square brackets. Curly bracket will tell the parser to use exclusive range (or as I referred to it, open ended range), while square brackets will instruct the parser to use inclusive range (or close ended range). Example:<br />
<strong>[cat TO dog]</strong> All terms lexically between &#8220;cat&#8221; and &#8220;dog&#8221;, including &#8220;cat&#8221; and &#8220;dog&#8221;<br />
<strong>{cat TO dog}</strong> All terms lexically between &#8220;cat&#8221; and &#8220;dog&#8221;, excluding &#8220;cat&#8221; and &#8220;dog&#8221;<br />
<strong>{cat TO dog]</strong> All terms lexically between &#8220;cat&#8221; and &#8220;dog&#8221;, excluding &#8220;cat&#8221;, but including &#8220;dog&#8221;<br />
<strong>[cat TO dog}</strong> All terms lexically between &#8220;cat&#8221; and &#8220;dog&#8221;, including &#8220;cat&#8221;, but excluding &#8220;dog&#8221;</p>
<p><strong><em>Q: What is the use of adding question marks to the syntax? This part of the webcast was not clear to me.</em></strong><br />
A: Sometimes, we see queries that are formatted as natural language queries, such as “What is a CPA?”. This query, submitted with a common query parser, will result very poorly, as the question mark at the end will be mistakenly treated as wildcard, missing the term CPA, and trying to match CPAN. (There goes my taxes… ) Lucid Query Parser will detect the natural language question form, and will ignore the misleading question mark at the end. Additional natural language features are documented <a href="http://lucidworks.lucidimagination.com/display/LWEUG/Natural+Language+Queries">here</a></p>
<p><em><strong>Q: Can you ask &#8220;sony~0.8 and not sony&#8221;?</strong></em><br />
A: Yes. The query, as typed is correct. A shorter format for this query will be &#8220;sony~0.8 -sony&#8221;, beware, however, that the shorter query has implicit OR, and not AND between the positive terms. In this case, the positive term count is one, making the two queries identical.</p>
<p><em><strong>Q: Is there an &#8220;explanation of understanding the query&#8221;?</strong></em><br />
A: Yes. In addition to Solr’s built in debug query, which will expose you to what query was actually submitted to Lucene, you can prefix your query with debugLog: this will output debug data from Lucid Query Parser to the console, for further examination.</p>
<p><em><strong>Q: how would you explain this to users? What&#8217;s the reception rate of such &#8220;complex&#8221; queries?</strong></em><br />
A: Great question. Few approaches may be valid here, and all involve UI that is smarter than &#8220;a text box&#8221;. One approach is to have an advanced mode, which allows the users to specify their queries to a series of widgets, i.e. date range queries can be submitted to UI date widgets, fuzzy queries can be automated by a checkbox saying &#8220;find misspelled content as well&#8221;, etc.<br />
Another approach is to provide a UI widget showing popular examples to your users, very much like I did with this demo. Users will be able then to create queries to their liking, using your samples as a reference.<br />
Another very effective approach is to empower your users with the field structure. Queries can be very effective when are targeted for specific fields, and field types.</p>
<p><em><strong>Q: Do you support phrases as synonyms, anti-stem, or stop words?</strong></em><br />
A: LucidWorks Enterprise supports phrases as synonyms. It does not support stop word phrases (i.e. ignore the phrase &#8220;dead end&#8221;, but not &#8220;dead&#8221; and not &#8220;end&#8221;).<br />
Regarding anti-stem, the out of the box LucidWorks Enterprise does not support anti-stem, but can easily be configured to do so, or to index both stemming and non stemming, using copy fields.</p>
<p>There were some questions that were not clear to me. If you still have more questions, email them to me, and I’ll post them here.</p>
<p>Wishing you all happy holiday season!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/12/06/lucid-query-parser-demo-afterword/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LucidWorks Enterprise Demo afterword</title>
		<link>http://www.lucidimagination.com/blog/2010/11/22/lucidworks-demo-afterword/</link>
		<comments>http://www.lucidimagination.com/blog/2010/11/22/lucidworks-demo-afterword/#comments</comments>
		<pubDate>Mon, 22 Nov 2010 18:21:42 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Lucid Imagination Solutions]]></category>
		<category><![CDATA[LucidWorks]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2699</guid>
		<description><![CDATA[<p>I’d like to thank you again for attending the LucidWorks Enterprise demonstration last Wednesday. For those of you, who didn’t, pick up the recorded version <a href="http://www.lucidimagination.com/solutions/webcasts/LucidWorks-Enterprise-demo">here</a> to catch up.</p>
<p>During the demo, you presented me with questions about LucidWorks Enterprise. I was able to answer some of them immediately following the demo; however, our timeslot for the demo caught up with us, so some were left unanswered. The team here at Lucid Imagination helped me capture &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I’d like to thank you again for attending the LucidWorks Enterprise demonstration last Wednesday. For those of you, who didn’t, pick up the recorded version <a href="http://www.lucidimagination.com/solutions/webcasts/LucidWorks-Enterprise-demo">here</a> to catch up.</p>
<p>During the demo, you presented me with questions about LucidWorks Enterprise. I was able to answer some of them immediately following the demo; however, our timeslot for the demo caught up with us, so some were left unanswered. The team here at Lucid Imagination helped me capture your questions, so let’s dive in:</p>
<p><em>Q: I have multiple domains, each domain should have its own search content and result presentation layer. For example: car search and job search, how could LucidWorks Enterprise handle multiple domains?</em></p>
<p>A: For complete separation of schema, relevancy and client UIs, use multiple collections. LucidWorks Enterprise is able to handle unlimited numbers of collections. (Available Dec 2010)</p>
<p><em>Q: Will LucidWorks Enterprise search content within SWF files?</em></p>
<p>A: In order to search content of SWF, you’ll have to provide the SWF metadata and text to LucidWorks Enterprise. This can be done with SWF file filter, available via commercial packages. LucidWorks Enterprise does not contain one out of the box.</p>
<p><em>Q: What about language sensitive search?</em></p>
<p>A: Even though the LucidWorks Enterprise admin UI is currently English only, LucidWorks Enterprise can support multiple languages both for content and for Search UI. There are multiple ways to do so; my favorite one is to organize different language content under different fields, each field can be associated with its own field type and its own analysis chain.</p>
<p><em>Q: Is Lucid Works Solr (not the enterprise version) still under the Apache License?</em></p>
<p>A: Yes. The LucidWorks for Solr Certified Version is still licensed under ASF license v2.</p>
<p><em>Q: When LucidWorks Enterprise connects to DB how does it update the indexes if user has crud multiple records?</em></p>
<p>A: When indexing content from DB, you can consider the following options: 1. Use SQL logic for delta updates (manage dirty flag, last update field or other DB logic) 2. Use the delta SQL statement when running updates 3. In some cases, use unique IDs from your database to manage duplicates in LucidWorks Enterprise indexes.</p>
<p><em>Q: Does Solr support personalized search?</em></p>
<p>A: No. Solr does not support personalized search. Personalized search is implemented in LucidWorks Enterprise only.</p>
<p><em>Q: Is there an online demo site that allows us to try out?</em></p>
<p>A: Search.lucidimagination.com, which indexes the mailing lists and other relevant community content of Lucene, Solr, Tika and many other projects is based on LucidWorks Enterprise. You can apply for the developer access program of LucidWorks Enterprise <a href="http://www.lucidimagination.com/developers/lucidworks-enterprise-developer-access-release">here</a>.</p>
<p><em>Q: Does this software support spatial searches?  e.g. &#8216;Show me records with 100 miles of zip code 90210&#8242;</em></p>
<p>A: The out of the box distribution of LucidWorks Enterprise does not include spatial search, this can be added on top of the Solr included with LucidWorks Enterprise.</p>
<p><em>Q: What kind of document type do you support?</em></p>
<p>A: Out of the box, we support HTML, XML, PPT, PPTX, XLS, XSLX, DOC, DOCX, ODF, PDF, Quattro, RTF, TXT, ZIP, TAR, MP3 and JPEG</p>
<p><em>Q: Can you show how to add users to roles?</em></p>
<p>A: Users can be added either via the ReST API (See: <a href="http://lucidworks.lucidimagination.com/display/LWEUG/Users">http://lucidworks.lucidimagination.com/display/LWEUG/Users</a>) or via binding to an LDAP server (see: <a href="http://lucidworks.lucidimagination.com/display/LWEUG/LDAP+Integration">http://lucidworks.lucidimagination.com/display/LWEUG/LDAP+Integration</a>)</p>
<p>Groups are only supported via LDAP integration.</p>
<p>Configuring users and groups to use the search filters (filtering documents for specific users and groups) can be configured via LWE UI, at admin-&gt;roles</p>
<p><em>Q: Has LucidWorks Enterprise been used on any Drupal CMS applications?</em></p>
<p>A: To the best of my knowledge, not in production yet.</p>
<p><em>Q: In a database import, how will the index be updated and in sync with database changes?</em></p>
<p>A: LucidWorks Enterprise supports 2 SQL queries for database importing – one for a full import, and one to be used for consecutive delta imports. The 2 SQL statements configuration is available only when accessing  the ReST API.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/11/22/lucidworks-demo-afterword/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Announcement: New LucidWorks certified distribution for Solr</title>
		<link>http://www.lucidimagination.com/blog/2010/11/18/announcement-new-lucidworks-certified-distribution-for-solr-2/</link>
		<comments>http://www.lucidimagination.com/blog/2010/11/18/announcement-new-lucidworks-certified-distribution-for-solr-2/#comments</comments>
		<pubDate>Thu, 18 Nov 2010 19:01:51 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2654</guid>
		<description><![CDATA[<p>We’re happy to announce the availability of the latest release of LucidWorks Certified Distribution for Solr.</p>
<p>LucidWorks for Solr v1.4.1 is a bugfix release for LucidWorks for Solr,  incorporating Apache Solr release 1.4.1, with few additional bugfixes that were introduced post Solr 1.4.1</p>
<p>LucidWorks for Solr v1.4.1 is tested to work with Windows, Linux and Mac.</p>
<p>Grab your copy from our <a title="Download LucidWorks" href="http://www.lucidimagination.com/software_downloads/certified/LucidWorks.jar">downloads section</a></p>
<p><strong>Changes and additions from Apache Solr 1.4</strong></p>
<p>* SOLR-1902: Upgraded Tika to &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>We’re happy to announce the availability of the latest release of LucidWorks Certified Distribution for Solr.</p>
<p>LucidWorks for Solr v1.4.1 is a bugfix release for LucidWorks for Solr,  incorporating Apache Solr release 1.4.1, with few additional bugfixes that were introduced post Solr 1.4.1</p>
<p>LucidWorks for Solr v1.4.1 is tested to work with Windows, Linux and Mac.</p>
<p>Grab your copy from our <a title="Download LucidWorks" href="http://www.lucidimagination.com/software_downloads/certified/LucidWorks.jar">downloads section</a></p>
<p><strong>Changes and additions from Apache Solr 1.4</strong></p>
<p>* SOLR-1902: Upgraded Tika to 0.8-SNAPSHOT (Tommaso Teofili, gsingers)</p>
<p>* SOLR-2036: Avoid expensive fieldCache ram estimation for the admin stats page. (yonik)</p>
<p>* SOLR-2100: The replication handler backup command didn&#8217;t save the commit point and hence could fail when a newer commit caused the older commit point to be removed before it was finished being copied.  This did not affect normal master/slave replication.  (Peter Sturge via yonik)</p>
<p>* SOLR-2180: It was possible for EmbeddedSolrServer to leave searchers open if a request threw an exception. (yonik)</p>
<p>* SOLR-2192: StreamingUpdateSolrServer.blockUntilFinished was not thread safe and could throw an exception. (yonik)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/11/18/announcement-new-lucidworks-certified-distribution-for-solr-2/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Data.gov on Solr</title>
		<link>http://www.lucidimagination.com/blog/2010/11/05/data-gov-on-solr/</link>
		<comments>http://www.lucidimagination.com/blog/2010/11/05/data-gov-on-solr/#comments</comments>
		<pubDate>Fri, 05 Nov 2010 21:43:44 +0000</pubDate>
		<dc:creator>Erik Hatcher</dc:creator>
				<category><![CDATA[ApacheCon]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[Erik Hatcher]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2604</guid>
		<description><![CDATA[<p>At <a href="http://apachecon.com">ApacheCon</a> this week I presented <a href="http://na.apachecon.com/c/acna2010/sessions/571">&#8220;Rapid Prototyping with Solr&#8221;</a>.  This is the third time I&#8217;ve given a presentation with the same title.  In the spirit of the rapid prototyping theme, each time I&#8217;ve created a new prototype just a day or so prior to presenting it.  At <a href="http://lucene-eurocon.org/sessions-track2-day2.html#4">Lucene EuroCon</a> the prototype used attendee data, a treemap visualization, and a cute little Solr-powered &#8220;app&#8221; for picking attendees at random for the conference giveaways.  For &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>At <a href="http://apachecon.com">ApacheCon</a> this week I presented <a href="http://na.apachecon.com/c/acna2010/sessions/571">&#8220;Rapid Prototyping with Solr&#8221;</a>.  This is the third time I&#8217;ve given a presentation with the same title.  In the spirit of the rapid prototyping theme, each time I&#8217;ve created a new prototype just a day or so prior to presenting it.  At <a href="http://lucene-eurocon.org/sessions-track2-day2.html#4">Lucene EuroCon</a> the prototype used attendee data, a treemap visualization, and a cute little Solr-powered &#8220;app&#8221; for picking attendees at random for the conference giveaways.  For a recent <a href="http://www.lucidimagination.com/blog/2010/06/10/rapid-prototyping-search-applications-with-solr/">Lucid webinar</a> the prototype was more general purpose, bringing in and making searchable rich documents and faceting on file types with a pie chart visualization.</p>
<p>This time around, the data set I chose was <a href="http://www.data.gov/raw/92">Data.gov&#8217;s catalog of datasets</a>, which fit with the ApacheCon open source aura, and Lucid Imagination&#8217;s support of <a href="http://opensourceforamerica.org/awards/2010-recipients">Open Source for America</a>, which helps to advocate for open source in the US Federal Government.  The prototype built includes faceting browsing, query term suggest, hit highlighting, result clustering, spell checking, document detail, and a bonus Venn diagram visualization.</p>
<p><span id="more-2604"></span></p>
<p>The prototype was built with these steps:</p>
<ol>
<li>Install LucidWorks for Solr</li>
<li>Grab the Data.gov catalog CSV file</li>
<li>Iterate a bit with Solr&#8217;s CSV update handler (the funnest way to get data into Solr) and a little Solr schema tinkering</li>
<li>Adjusted the Solr configuration and UI templates to get a nice look and feel, adding in a document detail page and a Venn diagram visualization comparing query overlaps</li>
</ol>
<p>Voilà (click the images for large view):</p>
<table class="plain" style="width: 100%;" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="60%"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/11/datagov_search.png"><img class="alignnone size-thumbnail wp-image-2617" title="Data.gov on Solr" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/11/datagov_search-150x150.png" alt="" width="150" height="150" /></a></td>
<td><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/11/datagov_compare.png"><img class="size-thumbnail wp-image-2627" title="query comparison Venn diagram" src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/11/datagov_compare-150x150.png" alt="" width="150" height="150" /></a></td>
</tr>
</tbody>
</table>
<p>This isn&#8217;t the first time we&#8217;ve toyed with Data.gov data&#8230; earlier this year, <a href="../../../../../../blog/2010/05/07/data-mining-data-dot-gov/">Hoss demonstrated Solr&#8217;s stats component</a> on another of Data.gov&#8217;s data sets.</p>
<p>My ApacheCon slides are published at Slideshare and embedded here:</p>
<div id="__ss_5675936" style="width: 425px;"><strong><a title="Rapid prototyping with solr" href="http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-5675936">Rapid prototyping with solr</a></strong><object id="__sse5675936" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=rapidprototypingwithsolr-101105050018-phpapp01&amp;stripped_title=rapid-prototyping-with-solr-5675936&amp;userName=erikhatcher" /><param name="name" value="__sse5675936" /><param name="allowfullscreen" value="true" /><embed id="__sse5675936" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=rapidprototypingwithsolr-101105050018-phpapp01&amp;stripped_title=rapid-prototyping-with-solr-5675936&amp;userName=erikhatcher" name="__sse5675936" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p>All the code and instructions for running the entire prototype yourself can be found here: <a href="https://github.com/erikhatcher/solr-rapid-prototyping/tree/master/ApacheCon2010">https://github.com/erikhatcher/solr-rapid-prototyping/tree/master/ApacheCon2010</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/11/05/data-gov-on-solr/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Introducing LucidWorks Enterprise</title>
		<link>http://www.lucidimagination.com/blog/2010/10/14/introducing-lucidworks-enterprise/</link>
		<comments>http://www.lucidimagination.com/blog/2010/10/14/introducing-lucidworks-enterprise/#comments</comments>
		<pubDate>Thu, 14 Oct 2010 18:42:57 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucid Imagination Solutions]]></category>
		<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[ZooKeeper]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2535</guid>
		<description><![CDATA[<p>Last week at <a href="http://lucenerevolution.org/" target="_blank">Lucene Revolution</a> we announced <a href="http://www.lucidimagination.com/enterprise-search-solutions/lucidworks" target="_blank">LucidWorks Enterprise.</a> LucidWorks Enterprise is a commercially supported search platform that builds on the power of <a href="http://lucene.apache.org/" target="_blank">Apache Lucene</a> and <a href="http://lucene.apache.org/solr/" target="_blank">Solr</a> to deliver a flexible and scalable search platform.</p>
<p>Gee.  That almost sounds like a marketing guy wrote it.  Let me try again: LucidWorks Enterprise is software that let&#8217;s you easily build great search applications.  You can install it, index some content, and search that content with just a &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Last week at <a href="http://lucenerevolution.org/" target="_blank">Lucene Revolution</a> we announced <a href="http://www.lucidimagination.com/enterprise-search-solutions/lucidworks" target="_blank">LucidWorks Enterprise.</a> LucidWorks Enterprise is a commercially supported search platform that builds on the power of <a href="http://lucene.apache.org/" target="_blank">Apache Lucene</a> and <a href="http://lucene.apache.org/solr/" target="_blank">Solr</a> to deliver a flexible and scalable search platform.</p>
<p>Gee.  That almost sounds like a marketing guy wrote it.  Let me try again: LucidWorks Enterprise is software that let&#8217;s you easily build great search applications.  You can install it, index some content, and search that content with just a few keystrokes and clicks of the mouse.  Or, you can build a full-blown, integrated search application with custom plugins and your own user interface, all running on a fifty-node cluster.  And if you really need it, the full flexibility of Solr is right there, always accessible.<br />
<a href="http://www.lucidimagination.com/blog/wp-content/uploads/2010/10/Components.png"><img src="http://www.lucidimagination.com/blog/wp-content/uploads/2010/10/Components-300x171.png" alt="" title="Components" width="300" height="171" class="alignnone size-medium wp-image-2545" /></a><br />
LucidWorks Enterprise extends Solr to include some enterprise-grade features, just as Solr extends Lucene to provide server functionality.  By extending and embedding Solr, we get to use all of its powerful features without having to do what many folks do: fork the code base.  You see, we are really embedding Solr without changing it appreciably.  Where we do change it, we contribute our code back to the open source project.  A great example of that is SolrCloud, a feature we developed specifically for LucidWorks Enterprise.  The problem was that it was such a fundamental change to Solr that we would have to maintain a forked code-base forever, or simply contribute the code to the open source project.  We chose the latter, and that&#8217;s the choice we will continue to prefer for such fundamental and powerful features.</p>
<p>So, what&#8217;s in LucidWorks Enterprise?  Here&#8217;s a quick summary of the main features:</p>
<dl>
<dt>SolrCloud:</dt>
<dd>provides simplified configuration management, load balancing and failover in a distributed environment.  It&#8217;s the key feature if you have a cluster of search nodes.</dd>
<dt>User Interface:</dt>
<dd>we added a web-driven user interface to simplify the process of configuring the system, indexing content, and trying out searches.</dd>
<dt>CLICK Scoring:</dt>
<dd>ever wonder how to get better results on searches over your own content?  The CLICK scoring framework provides feedback from searchers back into the relevance ranking of documents, improving relevance for all.</dd>
<dt>ReST API:</dt>
<dd>all of the power of LucidWorks Enterprise is available programmatically, so you can build more thorough applications and automated administration.</dd>
<dt>Smarter Defaults:</dt>
<dd>LucidWorks Enterprise is configured out-of-the-box to be a great search application, with good relevance and a full list of features like spell checking, autocomplete, and unsupervised feedback.</dd>
<dt>Data Acquisition:</dt>
<dd>we have added a crawler and scheduler that simplifies the process of getting data into LucidWorks Enterprise.  Just point it at a Web site or set of files, or import data from a database.</dd>
</dl>
<p>So that&#8217;s LucidWorks Enterprise.  We&#8217;ll be talking more about it over the coming days.  This is the beginning of an exciting time for Lucid Imagination: we&#8217;re really excited to raise the bar with search and get our customers using the latest and greatest in search technology: <a href="http://www.lucidimagination.com/enterprise-search-solutions/lucidworks" target="_blank">LucidWorks Enterprise.</a>  It&#8217;s free for developers, so <a href="http://www.lucidimagination.com/developers/lucidworks-enterprise-developer-access-release">download LucidWorks Enterprise now!</a></div>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/10/14/introducing-lucidworks-enterprise/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

