<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; Ruby</title>
	<atom:link href="http://www.lucidimagination.com/blog/category/ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Indexing rich files into Solr, quickly and easily</title>
		<link>http://www.lucidimagination.com/blog/2011/08/31/indexing-rich-files-into-solr-quickly-and-easily/</link>
		<comments>http://www.lucidimagination.com/blog/2011/08/31/indexing-rich-files-into-solr-quickly-and-easily/#comments</comments>
		<pubDate>Wed, 31 Aug 2011 14:07:53 +0000</pubDate>
		<dc:creator>Erik Hatcher</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Tika]]></category>
		<category><![CDATA[Erik Hatcher]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3885</guid>
		<description><![CDATA[<p>This past weekend I presented yet another &#8220;Rapid Prototyping with Solr&#8221; presentation, this time back in the saddle with the <a title="No Fluff, Just Stuff - Raleigh, August 2011" href="http://www.nofluffjuststuff.com/conference/raleigh/2011/08/home" target="_blank">No Fluff, Just Stuff symposium in Raleigh, NC</a>. I intentionally waited until the last minute to hack together a quick script to index some data I haven&#8217;t indexed before to demonstrate the ease at which one can grab Solr and immediately make some use out of it. This time around I cobbled together a &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>This past weekend I presented yet another &#8220;Rapid Prototyping with Solr&#8221; presentation, this time back in the saddle with the <a title="No Fluff, Just Stuff - Raleigh, August 2011" href="http://www.nofluffjuststuff.com/conference/raleigh/2011/08/home" target="_blank">No Fluff, Just Stuff symposium in Raleigh, NC</a>. I intentionally waited until the last minute to hack together a quick script to index some data I haven&#8217;t indexed before to demonstrate the ease at which one can grab Solr and immediately make some use out of it. This time around I cobbled together a simple Ruby script to index a directory full of rich (PDF, HTML, Word, etc) documents into a fresh Solr 3.3.0 install. Only a few seconds later I have my documents indexed, and even searchable through a user interface.</p>
<p>Here&#8217;s the steps I took:</p>
<ol>
<li>Download and &#8220;install&#8221; (aka unzip) Apache Solr 3.3.0</li>
<li>Launch Solr (cd example; java -jar start.jar)</li>
<li>Index files</li>
</ol>
<p>That&#8217;s it.  Here&#8217;s the indexing script I used:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'net/http'</span>
&nbsp;
<span style="color:#0066ff; font-weight:bold;">@dir</span> = <span style="color:#CC00FF; font-weight:bold;">Dir</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;/Users/erikhatcher/apache-solr-3.3.0/docs&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
<span style="color:#0066ff; font-weight:bold;">@url</span> = <span style="color:#CC00FF; font-weight:bold;">URI</span>.<span style="color:#9900CC;">parse</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;http://localhost:8983/solr&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#0066ff; font-weight:bold;">@connection</span> = <span style="color:#6666ff; font-weight:bold;">Net::HTTP</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>@url.<span style="color:#9900CC;">host</span>, <span style="color:#0066ff; font-weight:bold;">@url</span>.<span style="color:#9900CC;">port</span><span style="color:#006600; font-weight:bold;">&#41;</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">def</span> index<span style="color:#006600; font-weight:bold;">&#40;</span>filename<span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#0066ff; font-weight:bold;">@connection</span>.<span style="color:#9900CC;">get</span><span style="color:#006600; font-weight:bold;">&#40;</span>@url.<span style="color:#9900CC;">path</span> <span style="color:#006600; font-weight:bold;">+</span> <span style="color:#996600;">&quot;/update/extract?stream.file=#{filename}&amp;amp;literal.id=#{filename}&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">def</span> commit
<span style="color:#0066ff; font-weight:bold;">@connection</span>.<span style="color:#9900CC;">get</span><span style="color:#006600; font-weight:bold;">&#40;</span>@url.<span style="color:#9900CC;">path</span> <span style="color:#006600; font-weight:bold;">+</span> <span style="color:#996600;">&quot;/update?commit=true&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#0066ff; font-weight:bold;">@dir</span>.<span style="color:#9900CC;">each</span> <span style="color:#006600; font-weight:bold;">&#123;</span><span style="color:#006600; font-weight:bold;">|</span>name<span style="color:#006600; font-weight:bold;">|</span>
  f = <span style="color:#996600;">&quot;#{@dir.path}/#{name}&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">if</span> <span style="color:#CC00FF; font-weight:bold;">File</span>.<span style="color:#9900CC;">file</span>?<span style="color:#006600; font-weight:bold;">&#40;</span>f<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Indexing #{f}...&quot;</span>
    index<span style="color:#006600; font-weight:bold;">&#40;</span>f<span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#006600; font-weight:bold;">&#125;</span>
&nbsp;
<span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Committing...&quot;</span>
commit
&nbsp;
<span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Done!&quot;</span></pre></div></div>

<p>To make it look prettier, only a little dabbling with the templates is needed &#8211; add your company logo, customize the colors. And a change to the example (/browse handler) configuration to facet on content_type will allow you to easily search just within documents of specific types through the included UI.  The example code above indexed the docs that ship with Apache Solr 3.3.0; just change the path to a directory of yours to index your own content.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/08/31/indexing-rich-files-into-solr-quickly-and-easily/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Solr Search User Interface Examples</title>
		<link>http://www.lucidimagination.com/blog/2010/01/14/solr-search-user-interface-examples/</link>
		<comments>http://www.lucidimagination.com/blog/2010/01/14/solr-search-user-interface-examples/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 14:28:53 +0000</pubDate>
		<dc:creator>Erik Hatcher</dc:creator>
				<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1501</guid>
		<description><![CDATA[<p>A recent Slashdot poster asked for Solr-powered <a href="http://ask.slashdot.org/story/10/01/13/2014230/Attractive-Open-Source-Search-Interfaces">&#8220;Attractive Open Source Search Interfaces&#8221;</a>.  First, for some inspiration on what you might want to have in a search user interface, check out <a href="http://www.flickr.com/photos/morville/collections/72157603785835882/">Peter Morville&#8217;s excellent set of screenshot examples</a>.  One of <a href="http://ask.slashdot.org/story/10/01/13/2014230/Attractive-Open-Source-Search-Interfaces">my favorite examples</a> is, of course, from the library space.  <a href="http://www.flickr.com/photos/morville/sets/72157603794374821/">Morville showcases the NCSU library system site</a> on one of his sets:</p>
<p></p>
<p>Several Solr-powered open source faceted navigation search systems for libraries have been &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>A recent Slashdot poster asked for Solr-powered <a href="http://ask.slashdot.org/story/10/01/13/2014230/Attractive-Open-Source-Search-Interfaces">&#8220;Attractive Open Source Search Interfaces&#8221;</a>.  First, for some inspiration on what you might want to have in a search user interface, check out <a href="http://www.flickr.com/photos/morville/collections/72157603785835882/">Peter Morville&#8217;s excellent set of screenshot examples</a>.  One of <a href="http://ask.slashdot.org/story/10/01/13/2014230/Attractive-Open-Source-Search-Interfaces">my favorite examples</a> is, of course, from the library space.  <a href="http://www.flickr.com/photos/morville/sets/72157603794374821/">Morville showcases the NCSU library system site</a> on one of his sets:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="300" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="flashvars" value="offsite=true&amp;lang=en-us&amp;page_show_url=%2Fphotos%2Fmorville%2Fsets%2F72157603794374821%2Fshow%2F&amp;page_show_back_url=%2Fphotos%2Fmorville%2Fsets%2F72157603794374821%2F&amp;set_id=72157603794374821&amp;jump_to=" /><param name="allowFullScreen" value="true" /><param name="src" value="http://www.flickr.com/apps/slideshow/show.swf?v=71649" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://www.flickr.com/apps/slideshow/show.swf?v=71649" allowfullscreen="true" flashvars="offsite=true&amp;lang=en-us&amp;page_show_url=%2Fphotos%2Fmorville%2Fsets%2F72157603794374821%2Fshow%2F&amp;page_show_back_url=%2Fphotos%2Fmorville%2Fsets%2F72157603794374821%2F&amp;set_id=72157603794374821&amp;jump_to="></embed></object></p>
<p>Several Solr-powered open source faceted navigation search systems for libraries have been built with various technologies:  <a href="http://projectblacklight.org/">Blacklight</a> (Ruby on Rails), <a href="http://vufind.org/">VUFind</a> (PHP), <a href="http://code.google.com/p/kochief/">Kochief</a> (Django), <a href="http://code.google.com/p/multifacet/">MULtifacet</a> (Drupal). The question is, how general purpose are these user interfaces for non-library uses?  In theory they could all be purposed in this way, as every library really has a need to customize the UI.  Blacklight, for example, is written up in the <a href="http://www.lucidimagination.com/blog/2010/01/11/book-review-solr-packt-book/">Solr 1.4 book (by Smiley and Pugh)</a> with a showcase that works on their MusicBrainz example.</p>
<p>The tough part of generalizing a search UI is that what we all really want is a custom-for-us UI, one that is as flexible as our imagination.  <strong>And</strong> it must fit pragmatically into the technology constraints of our operation.  For some, Ruby on Rails is the ONLY way to go, for others a Java-based UI tier is the only technology that fits.</p>
<p>Here are some pointers to various other UI technologies on top of Solr:</p>
<ul>
<li><a href="http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/">Solritas</a> &#8211; Apache Velocity templating.  Available, with some config, in Solr 1.4.</li>
<li><a href="http://code4lib.org/node/154">Solr Flare </a>- a proof of concept RoR UI plugin, does Ajax suggest, faceted navigation, saved (session-based) searches, and more.</li>
<li><a href="http://github.com/evolvingweb/ajax-solr">AJAX Solr</a> &#8211; JavaScript, purely client-side widgets</li>
</ul>
<p>These are covered a bit with screenshots in my <a href="http://www.lucidimagination.com/blog/2009/08/20/edui-conference-solr-flair-search-user-interfaces-powered-by-apache-solr/">EdUI presentation &#8220;Solr Flair: Search User Interfaces Powered by Apache Solr&#8221;</a>.</p>
<p>What other open source UI frameworks live on top of Solr?  Add a comment with a pointer!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/01/14/solr-search-user-interface-examples/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>EdUI Conference &#8211;  Solr Flair: Search User Interfaces Powered by Apache Solr</title>
		<link>http://www.lucidimagination.com/blog/2009/08/20/edui-conference-solr-flair-search-user-interfaces-powered-by-apache-solr/</link>
		<comments>http://www.lucidimagination.com/blog/2009/08/20/edui-conference-solr-flair-search-user-interfaces-powered-by-apache-solr/#comments</comments>
		<pubDate>Thu, 20 Aug 2009 19:14:02 +0000</pubDate>
		<dc:creator>Erik Hatcher</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucid Imagination]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=907</guid>
		<description><![CDATA[[ Monday, 21 September 2009 to Tuesday, 22 September 2009. ] <p>I will be presenting <a href="http://www.eduiconf.org/session/solr-flair-hatcher/">&#8220;Solr Flair: Search User Interfaces Powered by Apache Solr&#8221;</a> at the upcoming <a href="http://www.eduiconf.org">EdUI 2009 Conference</a>.  I&#8217;m honored to be finally speaking at the same conference as my great friend and UI mentor, <a href="http://molly.com">Molly Holzschlag</a>.  I&#8217;ve got my work cut out for me to come up with a presentation worthy for a &#8220;UI&#8221; conference with folks of this caliber &#8230;</p>]]></description>
			<content:encoded><![CDATA[[ Monday, 21 September 2009 to Tuesday, 22 September 2009. ] <p>I will be presenting <a href="http://www.eduiconf.org/session/solr-flair-hatcher/">&#8220;Solr Flair: Search User Interfaces Powered by Apache Solr&#8221;</a> at the upcoming <a href="http://www.eduiconf.org">EdUI 2009 Conference</a>.  I&#8217;m honored to be finally speaking at the same conference as my great friend and UI mentor, <a href="http://molly.com">Molly Holzschlag</a>.  I&#8217;ve got my work cut out for me to come up with a presentation worthy for a &#8220;UI&#8221; conference with folks of this caliber headlining &#8211; I&#8217;m even looking forward to the goodies I&#8217;ll try pull out of my hat.  Here&#8217;s the abstract:</p>
<blockquote><p>Solr powers library, government, and enterprise search systems in thousands of applications.  This talk will showcase the various technologies and techniques used to build effective user search, browse, and find interfaces on top of Solr.  Several of the full featured open-source library Solr front-ends will be shown, including Blacklight and VuFind.  We&#8217;ll also demonstrate several front-end frameworks including:</p>
<p>• SolrJS &#8211; a JavaScript widget library<br />
• Solr Flare &#8211; a Ruby on Rails plugin featuring Simile Timeline integration, Ajax suggest, and more<br />
• Solritas &#8211; a builtin lightweight UI templating framework</p></blockquote>
<div style="width:425px;text-align:left" id="__ss_2042032"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" href="http://www.slideshare.net/erikhatcher/solr-flair-search-user-interfaces-powered-by-apache-solr" title="Solr Flair: Search User Interfaces Powered by Apache Solr">Solr Flair: Search User Interfaces Powered by Apache Solr</a><object style="margin:0px" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=solrflair-090922122758-phpapp02&#038;stripped_title=solr-flair-search-user-interfaces-powered-by-apache-solr" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=solrflair-090922122758-phpapp02&#038;stripped_title=solr-flair-search-user-interfaces-powered-by-apache-solr" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="font-size:11px;font-family:tahoma,arial;height:26px;padding-top:2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">documents</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/erikhatcher">Erik Hatcher</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/08/20/edui-conference-solr-flair-search-user-interfaces-powered-by-apache-solr/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>acts_as_solr with rich document indexing</title>
		<link>http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/</link>
		<comments>http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/#comments</comments>
		<pubDate>Wed, 18 Feb 2009 07:00:36 +0000</pubDate>
		<dc:creator>Erik Hatcher</dc:creator>
				<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[acts_as_solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=89</guid>
		<description><![CDATA[<p>A hearty thanks to the <a href="http://cvreg.org/">Central Virginia Ruby Enthusiasts&#8217; Group</a>, who invited me to speak on Solr+Ruby giving me a good reason to delve deeply back into solr-ruby and acts_as_solr.</p>
<p>Let&#8217;s start a Rails project from scratch to illustrate how simple it is to get up and running with acts_as_solr.  The example is, <a href="http://www.lucidimagination.com/Community/Marketplace/Jobs-in-Lucene-and-Solr/">indeed</a>, a fairly real-world&#8217;ish type of need.   We&#8217;re going to index resumes, which could be in standard rich document formats &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>A hearty thanks to the <a href="http://cvreg.org/">Central Virginia Ruby Enthusiasts&#8217; Group</a>, who invited me to speak on Solr+Ruby giving me a good reason to delve deeply back into solr-ruby and acts_as_solr.</p>
<p>Let&#8217;s start a Rails project from scratch to illustrate how simple it is to get up and running with acts_as_solr.  The example is, <a href="http://www.lucidimagination.com/Community/Marketplace/Jobs-in-Lucene-and-Solr/">indeed</a>, a fairly real-world&#8217;ish type of need.   We&#8217;re going to index resumes, which could be in standard rich document formats such as PDF, Word, HTML, or plain text.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">rails resume
<span style="color: #7a0874; font-weight: bold;">cd</span> resume
script<span style="color: #000000; font-weight: bold;">/</span>generate scaffold resume first_name:string last_name:string file_name:string
rake db:migrate</pre></div></div>

<p>Thanks to the magic that is Rails, we now have a working application that allows standard CRUD operations on a resumes table in a relational database.  (not discussed further here, but start script/server and to navigate to the usual http://localhost:3000/resumes.  We&#8217;re going to stick closer to the metal and use script/console for direct ActiveRecord and Solr API tinkering)</p>
<p>Next we add the acts_as_solr plugin to our application:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">script<span style="color: #000000; font-weight: bold;">/</span>plugin <span style="color: #c20cb9; font-weight: bold;">install</span> <span style="color: #c20cb9; font-weight: bold;">git</span>:<span style="color: #000000; font-weight: bold;">//</span>github.com<span style="color: #000000; font-weight: bold;">/</span>mattmatt<span style="color: #000000; font-weight: bold;">/</span>acts_as_solr.git</pre></div></div>

<blockquote><p>A note about the acts_as_solr codebase: it all started with an <a href="http://www.lucidimagination.com/search/document/b413a95a388d0383/acts_as_solr">innocent hack that I posted to the solr-user list</a>.  It got picked up [editor 3/18/09: respectfully added a special mention of <a href="http://www.workingwithrails.com/person/6599-thiago-jackiw">Thiago Jackiw</a><a href="http://www.workingwithrails.com/person/6599-thiago-jackiw"></a>] by Thiago and he turned it  into a serious general purpose ActiveRecord modeling plugin <a href="http://acts-as-solr.rubyforge.org/">hosted at RubyForge</a>, and now exists as numerous git repository forks.  The currently best maintained version is <a href="http://github.com/mattmatt/acts_as_solr/tree/master">Mathias Meyer&#8217;s branch</a>.</p></blockquote>
<p>And we start Solr:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">rake solr:start</pre></div></div>

<p>We now add Solr to the lifecycle of the Resume model, such that when a Resume is added or updated in the database it also gets indexed into Solr, and deleted from Solr when it is removed from the database.  It really couldn&#8217;t be any easier:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> Resume <span style="color:#006600; font-weight:bold;">&lt;</span> <span style="color:#6666ff; font-weight:bold;">ActiveRecord::Base</span>
  acts_as_solr
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>Plugging in acts_as_solr provides not only the lifecycle hooks to keep the database and Solr in sync, it also provides additional finder methods.  Here&#8217;s an example of using Resume#find_by_solr:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">$ script<span style="color:#006600; font-weight:bold;">/</span>console
<span style="color:#006600; font-weight:bold;">&gt;&gt;</span> Resume.<span style="color:#9900CC;">create</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:first_name</span><span style="color:#006600; font-weight:bold;">=&gt;</span>;<span style="color:#996600;">'Joe'</span>, <span style="color:#ff3333; font-weight:bold;">:last_name</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">'Programmer'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">&gt;&gt;</span> Resume.<span style="color:#9900CC;">find_by_solr</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;program*&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006600; font-weight:bold;">&lt;</span><span style="color:#008000; font-style:italic;">#1.0460204, :query_time=&gt;0, :total=&gt;1, :docs=&gt;[#&lt;Resume id: 4, first_name: &quot;Joe&quot;, last_name: &quot;Programmer&quot;, file_name: nil, created_at: &quot;2009-02-13 23:15:29&quot;, updated_at: &quot;2009-02-13 23:15:29&quot;&gt;]}&gt;</span></pre></div></div>

<p>The &#8220;program*&#8221; query matches any words indexed that begin with &#8220;program&#8221;.  Note that the result from find_by_solr is an ActsAsSolr::SearchResults instance.  This wrapper provides the docs that normally are returned from ActiveRecord finder methods in addition to other Solr information, including the query_time and total number of documents matched.  The order of the docs array defaults to descending score (a measure of relevancy to a query).</p>
<p>So far so good &#8211; we&#8217;ve got a Rails application with an ActiveRecord model tied to acts_as_solr.  Now comes the trickier part of indexing the resume text.</p>
<blockquote><p><strong>Solr Cell</strong><br />
A content extraction library (aka Solr Cell) was added in Solr 1.4.  However, at the time of writing acts_as_solr embeds Solr 1.3.  So we need to do a little hacking to bring in a newer version of Solr with the Solr Cell dependencies and configuration.  In the future, it is likely acts_as_solr will ship with Solr Cell built-in, so be sure to check your version.</p></blockquote>
<p>First, stop Solr:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">rake solr:stop</pre></div></div>

<p>Grab a nightly build of Solr from <a href="http://people.apache.org/builds/lucene/solr/nightly/">http://people.apache.org/builds/lucene/solr/nightly/</a>.  Unarchive the distribution, and copy over the lib directory containing the Solr Cell plugin and dependencies, and also replace solr.war (the entire Solr web application).</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">cp</span> <span style="color: #660033;">-R</span> apache-solr-nightly<span style="color: #000000; font-weight: bold;">/</span>example<span style="color: #000000; font-weight: bold;">/</span>solr<span style="color: #000000; font-weight: bold;">/</span>lib resume<span style="color: #000000; font-weight: bold;">/</span>vendor<span style="color: #000000; font-weight: bold;">/</span>plugins<span style="color: #000000; font-weight: bold;">/</span>acts_as_solr<span style="color: #000000; font-weight: bold;">/</span>solr<span style="color: #000000; font-weight: bold;">/</span>solr<span style="color: #000000; font-weight: bold;">/</span>
<span style="color: #c20cb9; font-weight: bold;">cp</span> apache-solr-nightly<span style="color: #000000; font-weight: bold;">/</span>example<span style="color: #000000; font-weight: bold;">/</span>webapps<span style="color: #000000; font-weight: bold;">/</span>solr.war resume<span style="color: #000000; font-weight: bold;">/</span>vendor<span style="color: #000000; font-weight: bold;">/</span>plugins<span style="color: #000000; font-weight: bold;">/</span>acts_as_solr<span style="color: #000000; font-weight: bold;">/</span>solr<span style="color: #000000; font-weight: bold;">/</span>webapps<span style="color: #000000; font-weight: bold;">/</span>solr.war</pre></div></div>

<p>And now add the Solr Cell request handler to vendor/plugins/acts_as_solr/solr/solr/conf/solrconfig.xml (add it anywhere as sibling to the other request handlers defined):</p>
<p>And enable remote streaming by setting enableRemoteStreaming=&#8221;true&#8221; on the requestParsers element.</p>
<blockquote><p>Enabling remote streaming comes with a stern warning &#8220;Make sure your system has some authentication before enabling remote streaming!&#8221;.  Our best advice is to firewall Solr such that only the application server, or in this example simply localhost itself, can make requests to Solr.  Having remote streaming enabled allows some request handlers, if configured, to pull content from a URL or from a local file path.  This isn&#8217;t necessarily a bad thing, but restricting who or where requests can be made to Solr is a wise production deployment consideration.  Even with remote streaming disabled, general /update is accessible and documents can be added or deleted easily.  So do take this as a production deployment concern to address in your network architecture.</p></blockquote>
<p>What this now gives us is the ability to index rich document content with simple requests to Solr.  Thanks to Solr&#8217;s content streaming flexibility, Solr can get the file content from a local file path, a remotely accessible URL, or through the file actually being POSTed in the request.  In this exercise, we&#8217;re going to send Solr a local file path, which assumes the Solr and Ruby ActiveRecord application tier can see the same path.  Here&#8217;s an example of the kind of lightweight request it takes to index a PDF file:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">curl <span style="color: #ff0000;">&quot;http://localhost:8982/solr/update/extract?stream.file=/path/to/ErikHatcherResume.pd&amp;ext.idx.attr=false&amp;ext.def.fl=text_t&amp;ext.ignore.und.fl=true&amp;ext.map.title=title_t&amp;ext.literal.id=1&amp;wt=ruby&quot;</span></pre></div></div>

<p>That&#8217;s some ugly parameters, but thankfully <a href="http://wiki.apache.org/solr/ExtractingRequestHandler">the Solr Cell wiki page</a> spells them out in detail.  The Solr request in prose &#8211; the local /path/to/ErikHatcherResume.pdf is sent to Solr, Solr reads the contents of that file, the text is extracted into the text_t field, undefined fields are ignored, general attributes extracted are ignored, but the title field is mapped to the title_t field, and the id field is mapped literally to the value of 1.  The general purpose acts_as_solr schema has a convenient *_t field mapping for bringing in both the text content and metadata attributes as needed and all *_t fields are internally copied to a single searchable &#8220;text&#8221; field.</p>
<p>The solr-ruby library, at the time of this writing, does not have built-in support for Solr Cell style requests, though it easily allows custom request types to be used.  Here&#8217;s our solr_cell_request.rb:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> SolrCellRequest <span style="color:#006600; font-weight:bold;">&lt;</span> <span style="color:#6666ff; font-weight:bold;">Solr::Request</span>::<span style="color:#CC0066; font-weight:bold;">Select</span>
  <span style="color:#9966CC; font-weight:bold;">def</span> initialize<span style="color:#006600; font-weight:bold;">&#40;</span>doc,file_name<span style="color:#006600; font-weight:bold;">&#41;</span>
    params = <span style="color:#006600; font-weight:bold;">&#123;</span>
      <span style="color:#996600;">'ext.idx.attr'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#0000FF; font-weight:bold;">false</span>,        <span style="color:#008000; font-style:italic;"># don't index any attributes, unless explicitly mapped</span>
      <span style="color:#996600;">'ext.def.fl'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#996600;">'text_t'</span>,        <span style="color:#008000; font-style:italic;"># all text extracted goes to text_t (since it is a stored field, for highlighting)</span>
      <span style="color:#996600;">'ext.ignore.und.fl'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#0000FF; font-weight:bold;">true</span>,      <span style="color:#008000; font-style:italic;"># ignore all undefined fields</span>
      <span style="color:#996600;">'ext.map.title'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#996600;">'title_t'</span>,
      <span style="color:#996600;">'ext.resource.name'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> file_name, <span style="color:#008000; font-style:italic;"># TIKA-154 workaround</span>
      <span style="color:#996600;">'stream.file'</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> file_name,
    <span style="color:#006600; font-weight:bold;">&#125;</span>
    doc.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>f<span style="color:#006600; font-weight:bold;">|</span>
      params<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;ext.literal.#{f.name}&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span> = f.<span style="color:#9900CC;">value</span>
      <span style="color:#9966CC; font-weight:bold;">if</span> f.<span style="color:#9900CC;">boost</span>
        params<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;ext.boost.#{f.name}&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span> = f.<span style="color:#9900CC;">boost</span>
      <span style="color:#9966CC; font-weight:bold;">end</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
    <span style="color:#9966CC; font-weight:bold;">super</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#0000FF; font-weight:bold;">nil</span>,params<span style="color:#006600; font-weight:bold;">&#41;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> handler
    <span style="color:#996600;">'update/extract'</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">class</span> SolrCellResponse <span style="color:#006600; font-weight:bold;">&lt;</span> <span style="color:#6666ff; font-weight:bold;">Solr::Response::Ruby</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<blockquote><p>During the development SolrCellRequest, I noticed plain text (surprisingly!) files were not indexing.  I asked about this and quickly <a href="http://www.lucidimagination.com/search/document/42adff3d3ac63bd9/solr_cell_extractingrequesthandler_and_plain_text_files">received an explanation</a>.  This will be resolved when a newer version of Tika, including <a href="http://issues.apache.org/jira/browse/TIKA-154">TIKA-154</a>, is brought into Solr.  In the meantime, setting ext.resource.name solves the issue.</p></blockquote>
<p>The doc passed into the SolrCellRequest constructor is a Solr::Document.  We&#8217;ve jumped ahead, knowing that we&#8217;ll be able to easily override an acts_as_solr method where the ActiveRecord is available as a Solr::Document.  [note that solr-ruby does have the requirement that there be a parallel *Response class to the *Request. This is why the dummy SolrCellResponse is necessary]</p>
<p>script/console is still our friend, let&#8217;s give it a try using pure solr-ruby API:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">$ script<span style="color: #000000; font-weight: bold;">/</span>console
Loading development environment <span style="color: #7a0874; font-weight: bold;">&#40;</span>Rails 2.2.2<span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #000000; font-weight: bold;">&gt;&gt;</span> solr = Solr::Connection.new<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #ff0000;">&quot;http://localhost:8982/solr&quot;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #000000; font-weight: bold;">&gt;&gt;</span> req = SolrCellRequest.new<span style="color: #7a0874; font-weight: bold;">&#40;</span>Solr::Document.new<span style="color: #7a0874; font-weight: bold;">&#40;</span>:<span style="color: #007800;">id</span>=<span style="color: #000000; font-weight: bold;">&gt;</span><span style="color: #000000;">1</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>, <span style="color: #ff0000;">'/path/to/ErikHatcherResume.pdf'</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #000000; font-weight: bold;">&gt;&gt;</span> solr.send<span style="color: #7a0874; font-weight: bold;">&#40;</span>req<span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #000000; font-weight: bold;">&gt;&gt;</span> solr.commit</pre></div></div>

<p><strong>Checkpoint</strong> &#8211; we&#8217;ve now got Ruby able to index rich files into Solr by a very simple API.  What&#8217;s left?  We have to tie this indexing into the ActiveRecord lifecycle exposed by acts_as_solr.  There&#8217;s a nice and easy method to override on a per-acts_as_solr-model basis to change how the indexing request to Solr works.   It looks like this (in acts_as_solr&#8217;s commons_methods.rb):</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">    <span style="color:#9966CC; font-weight:bold;">def</span> solr_add<span style="color:#006600; font-weight:bold;">&#40;</span>add_xml<span style="color:#006600; font-weight:bold;">&#41;</span>   <span style="color:#008000; font-style:italic;"># note, it is actually a Solr::Document passed in, not XML</span>
      <span style="color:#6666ff; font-weight:bold;">ActsAsSolr::Post</span>.<span style="color:#9900CC;">execute</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#6666ff; font-weight:bold;">Solr::Request::AddDocument</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>add_xml<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>We&#8217;ll override that in our Resume model:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#9966CC; font-weight:bold;">class</span> Resume <span style="color:#006600; font-weight:bold;">&lt;</span> <span style="color:#6666ff; font-weight:bold;">ActiveRecord::Base</span>
  acts_as_solr
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">def</span> solr_add<span style="color:#006600; font-weight:bold;">&#40;</span>doc<span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#008000; font-style:italic;"># puts doc.to_xml.to_s # handy view of the Solr doc acts_as_solr builds</span>
    <span style="color:#9966CC; font-weight:bold;">if</span> file_name
      <span style="color:#6666ff; font-weight:bold;">ActsAsSolr::Post</span>.<span style="color:#9900CC;">execute</span><span style="color:#006600; font-weight:bold;">&#40;</span>SolrCellRequest.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>doc, file_name<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#9966CC; font-weight:bold;">else</span>
      <span style="color:#6666ff; font-weight:bold;">ActsAsSolr::Post</span>.<span style="color:#9900CC;">execute</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#6666ff; font-weight:bold;">Solr::Request::AddDocument</span>.<span style="color:#9900CC;">new</span><span style="color:#006600; font-weight:bold;">&#40;</span>doc<span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#9966CC; font-weight:bold;">end</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>And now the grand finale, the code us Rubyists love to see, that <em>one</em> elegant line of code:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&gt;&gt;</span> Resume.<span style="color:#9900CC;">create</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#ff3333; font-weight:bold;">:first_name</span> <span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#996600;">&quot;Erik&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:last_name</span>=<span style="color:#006600; font-weight:bold;">&amp;</span>gt;<span style="color:#996600;">&quot;Hatcher&quot;</span>, <span style="color:#ff3333; font-weight:bold;">:file_name</span><span style="color:#006600; font-weight:bold;">=&gt;</span><span style="color:#996600;">&quot;/path/to/ErikHatcherResume.pdf&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span></pre></div></div>

<p>And a quick test that shows it works (&#8220;java&#8221; is in my resume):</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#006600; font-weight:bold;">&gt;&gt;</span> Resume.<span style="color:#9900CC;">find_by_solr</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;java&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#006600; font-weight:bold;">=&gt;</span> <span style="color:#006600; font-weight:bold;">&lt;</span><span style="color:#008000; font-style:italic;">#1, :total=&gt;1, :docs=&gt;[#&lt;Resume id: 6, first_name: &quot;Erik&quot;, last_name: &quot;Hatcher&quot;, file_name: &quot;/path/to/ErikHatcherResume.pdf&quot;, created_at: &quot;2009-02-17 22:33:16&quot;, updated_at: &quot;2009-02-17 22:33:16&quot;&gt;]}&gt;</span></pre></div></div>

<p>See also Sami Siren&#8217;s <a href="http://www.lucidimagination.com/index.php?option=com_content&amp;task=view&amp;id=106">Content Extraction with Tika</a> article.</p>
<p>We encourage you to provide comments and feedback to us on this entry.  Particularly I&#8217;m interested in hearing from Solr-using Rubyists out there and what challenges you&#8217;ve faced in using Solr and how we can help fix bugs or educate further.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
	</channel>
</rss>

