<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; Libraries</title>
	<atom:link href="http://www.lucidimagination.com/blog/category/libraries/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Performance of Google&#8217;s V8 Javascript engine in Solr</title>
		<link>http://www.lucidimagination.com/blog/2011/11/10/performance-of-googles-v8-javascript-engine-in-solr/</link>
		<comments>http://www.lucidimagination.com/blog/2011/11/10/performance-of-googles-v8-javascript-engine-in-solr/#comments</comments>
		<pubDate>Thu, 10 Nov 2011 15:39:34 +0000</pubDate>
		<dc:creator>Emmanuel Espina</dc:creator>
				<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4426</guid>
		<description><![CDATA[<p>The use of scripting languages to add new functionality to systems is something that I&#8217;ve always found very helpful. You don&#8217;t have to download the source code of the system, if it has “scriptable” parts you can add simple functionality in minutes without even compiling. Java provides this capabilities in particular with Javascript. You can refer to <a href="http://java.sun.com/developer/technicalArticles/J2SE/Desktop/scripting/">http://java.sun.com/developer/technicalArticles/J2SE/Desktop/scripting/</a> for more information on this.</p>
<p>Unfortunately, Java 6&#8242;s only included library is Rhino that converts the javascript &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>The use of scripting languages to add new functionality to systems is something that I&#8217;ve always found very helpful. You don&#8217;t have to download the source code of the system, if it has “scriptable” parts you can add simple functionality in minutes without even compiling. Java provides this capabilities in particular with Javascript. You can refer to <a href="http://java.sun.com/developer/technicalArticles/J2SE/Desktop/scripting/">http://java.sun.com/developer/technicalArticles/J2SE/Desktop/scripting/</a> for more information on this.</p>
<p>Unfortunately, Java 6&#8242;s only included library is Rhino that converts the javascript code into JVM code and its performance is not good. For reasons like these, the scripting languages in general are experiencing something that Java felt itself in the past: the popular belief that they are slow.</p>
<p>The performance is certaintly not that bad but in a critical application, people would prefer to develop native (java) components to keep the performance before losing performance with a probably unnecesary script. This entry is about performance of scripting languages; in particular, other Javascript engines that you can attach to Java.</p>
<p>Google chrome has surprised everyone with their blazing fast javascript engine: V8.</p>
<p>I downloaded V8 and tested it against the regular Rhino option. Actually I didn&#8217;t implement the neccesary wrappers to add V8 to Java, nor the JNI C programming necessary as a glue to access V8 functions (which are native in the traditional sense, real machine code running on real processors, like in the old days). I used a library that I found in internet: <a href="http://code.google.com/p/jav8/">http://code.google.com/p/jav8/</a>.</p>
<p>For the first test I downloaded a <a href="https://raw.github.com/espinaemmanuel/Blog-resources/master/lucid-11-10-2011/test.js">benchmark</a> from V8 that computes an RSA encryption of a text using Javascript. I know, you will feel that I&#8217;m cheating by using a benchmark that is not impartial, but the results are pretty conclusive anyway: <a title="http://v8.googlecode.com/svn/data/benchmarks/" href="http://v8.googlecode.com/svn/data/benchmarks/">http://v8.googlecode.com/svn/data/benchmarks/</a>.</p>
<p>The code (download <a href="https://raw.github.com/espinaemmanuel/Blog-resources/master/lucid-11-10-2011/main.java">here</a>) also shows the basic usage of the scripting functionality of Java.</p>
<pre>public class main {

	public static void main(String[] args) throws FileNotFoundException {

		ScriptEngineManager sm = new ScriptEngineManager();
		FileReader file = new FileReader("test.js");
		ScriptEngine jsEngine = sm.getEngineByName("jav8");

		int iter = Integer.parseInt(args[0]);

		try {
			long acum = 0;
			for(int i=0; i&lt;iter; i++){
				long start = System.currentTimeMillis();
				Object ob = jsEngine.eval(file);
				long end = System.currentTimeMillis();

				acum += end - start;
			}
			System.out.println(acum);
		} catch (ScriptException ex) {
			ex.printStackTrace();
		}
	}
}</pre>
<p>I created a simple bash script to run this many times with different number of iterations and these are the results (X has the iterations and Y has the time, so less is better)</p>
<p style="text-align: left;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/11/plot-v8.png"><img class="aligncenter size-medium wp-image-4434" title="plot-v8" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/11/plot-v8-300x202.png" alt="" width="300" height="202" /></a></p>
<p>The performance difference is outstanding. Particularly interesting is the scalability of V8 when we add more iterations. V8 has an advanced cache system an surely that is helping to keep the performance as the iteration grows.</p>
<p>A real application that performs a set of repetitive tasks is the ScriptTransformer of the Data Import Handler of Solr. This transformer applies a function written in Javascript to each row of a table. This infamous component is something very useful but its performance has always been horrible.</p>
<p>To continue the tests I compared the standard Rhino script engine vs V8 applied to the ScriptTransformer of Solr. I had to modify the ScriptTransformer and remove its “reflection style” implementation (apparently to keep compatibility with 1.5, but quite ugly anyway) to make jav8 work with it. The test consisted in encrypting 5000 records of text from a database. (modified <a href="https://raw.github.com/espinaemmanuel/Blog-resources/master/lucid-11-10-2011/ScriptTransformer.java">ScriptTransformer.java</a>)</p>
<p>The results:</p>
<table class="aligncenter" width="300">
<tbody>
<tr>
<td valign="top" width="340"><strong>Engine</strong></td>
<td valign="top" width="340"><strong>Time taken (seconds)</strong></td>
</tr>
<tr>
<td valign="top" width="340">V8</td>
<td valign="top" width="340">12,347</td>
</tr>
<tr>
<td valign="top" width="340">Rhino</td>
<td valign="top" width="340">83,255</td>
</tr>
</tbody>
</table>
<p>Again the results show that V8 wins by a big margin.</p>
<p>The conclusion that we extract here is that adding script engines to our systems does not imply that the performance will be damaged. If you accept the use of native libraries (V8 in this case) a script engine can make your systems much more flexible without slowing them down.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/11/10/performance-of-googles-v8-javascript-engine-in-solr/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Solr Search User Interface Examples</title>
		<link>http://www.lucidimagination.com/blog/2010/01/14/solr-search-user-interface-examples/</link>
		<comments>http://www.lucidimagination.com/blog/2010/01/14/solr-search-user-interface-examples/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 14:28:53 +0000</pubDate>
		<dc:creator>Erik Hatcher</dc:creator>
				<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1501</guid>
		<description><![CDATA[<p>A recent Slashdot poster asked for Solr-powered <a href="http://ask.slashdot.org/story/10/01/13/2014230/Attractive-Open-Source-Search-Interfaces">&#8220;Attractive Open Source Search Interfaces&#8221;</a>.  First, for some inspiration on what you might want to have in a search user interface, check out <a href="http://www.flickr.com/photos/morville/collections/72157603785835882/">Peter Morville&#8217;s excellent set of screenshot examples</a>.  One of <a href="http://ask.slashdot.org/story/10/01/13/2014230/Attractive-Open-Source-Search-Interfaces">my favorite examples</a> is, of course, from the library space.  <a href="http://www.flickr.com/photos/morville/sets/72157603794374821/">Morville showcases the NCSU library system site</a> on one of his sets:</p>
<p></p>
<p>Several Solr-powered open source faceted navigation search systems for libraries have been &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>A recent Slashdot poster asked for Solr-powered <a href="http://ask.slashdot.org/story/10/01/13/2014230/Attractive-Open-Source-Search-Interfaces">&#8220;Attractive Open Source Search Interfaces&#8221;</a>.  First, for some inspiration on what you might want to have in a search user interface, check out <a href="http://www.flickr.com/photos/morville/collections/72157603785835882/">Peter Morville&#8217;s excellent set of screenshot examples</a>.  One of <a href="http://ask.slashdot.org/story/10/01/13/2014230/Attractive-Open-Source-Search-Interfaces">my favorite examples</a> is, of course, from the library space.  <a href="http://www.flickr.com/photos/morville/sets/72157603794374821/">Morville showcases the NCSU library system site</a> on one of his sets:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="300" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="flashvars" value="offsite=true&amp;lang=en-us&amp;page_show_url=%2Fphotos%2Fmorville%2Fsets%2F72157603794374821%2Fshow%2F&amp;page_show_back_url=%2Fphotos%2Fmorville%2Fsets%2F72157603794374821%2F&amp;set_id=72157603794374821&amp;jump_to=" /><param name="allowFullScreen" value="true" /><param name="src" value="http://www.flickr.com/apps/slideshow/show.swf?v=71649" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://www.flickr.com/apps/slideshow/show.swf?v=71649" allowfullscreen="true" flashvars="offsite=true&amp;lang=en-us&amp;page_show_url=%2Fphotos%2Fmorville%2Fsets%2F72157603794374821%2Fshow%2F&amp;page_show_back_url=%2Fphotos%2Fmorville%2Fsets%2F72157603794374821%2F&amp;set_id=72157603794374821&amp;jump_to="></embed></object></p>
<p>Several Solr-powered open source faceted navigation search systems for libraries have been built with various technologies:  <a href="http://projectblacklight.org/">Blacklight</a> (Ruby on Rails), <a href="http://vufind.org/">VUFind</a> (PHP), <a href="http://code.google.com/p/kochief/">Kochief</a> (Django), <a href="http://code.google.com/p/multifacet/">MULtifacet</a> (Drupal). The question is, how general purpose are these user interfaces for non-library uses?  In theory they could all be purposed in this way, as every library really has a need to customize the UI.  Blacklight, for example, is written up in the <a href="http://www.lucidimagination.com/blog/2010/01/11/book-review-solr-packt-book/">Solr 1.4 book (by Smiley and Pugh)</a> with a showcase that works on their MusicBrainz example.</p>
<p>The tough part of generalizing a search UI is that what we all really want is a custom-for-us UI, one that is as flexible as our imagination.  <strong>And</strong> it must fit pragmatically into the technology constraints of our operation.  For some, Ruby on Rails is the ONLY way to go, for others a Java-based UI tier is the only technology that fits.</p>
<p>Here are some pointers to various other UI technologies on top of Solr:</p>
<ul>
<li><a href="http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/">Solritas</a> &#8211; Apache Velocity templating.  Available, with some config, in Solr 1.4.</li>
<li><a href="http://code4lib.org/node/154">Solr Flare </a>- a proof of concept RoR UI plugin, does Ajax suggest, faceted navigation, saved (session-based) searches, and more.</li>
<li><a href="http://github.com/evolvingweb/ajax-solr">AJAX Solr</a> &#8211; JavaScript, purely client-side widgets</li>
</ul>
<p>These are covered a bit with screenshots in my <a href="http://www.lucidimagination.com/blog/2009/08/20/edui-conference-solr-flair-search-user-interfaces-powered-by-apache-solr/">EdUI presentation &#8220;Solr Flair: Search User Interfaces Powered by Apache Solr&#8221;</a>.</p>
<p>What other open source UI frameworks live on top of Solr?  Add a comment with a pointer!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/01/14/solr-search-user-interface-examples/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Library Love</title>
		<link>http://www.lucidimagination.com/blog/2010/01/08/lucid-library-love/</link>
		<comments>http://www.lucidimagination.com/blog/2010/01/08/lucid-library-love/#comments</comments>
		<pubDate>Fri, 08 Jan 2010 20:19:54 +0000</pubDate>
		<dc:creator>Erik Hatcher</dc:creator>
				<category><![CDATA[Libraries]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[library]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1453</guid>
		<description><![CDATA[<p>I&#8217;ve long had a passion for improving the findability within libraries.  The richness of the cultural artifacts that one can find with a bit of foraging astonishes the imagination.  I had the pleasure of working with the <a href="http://patacriticism.org">Applied Research in Patacriticism</a> group at the University of Virginia.  While building the first version of <a href="http://www.collex.org/">Collex</a> (collect/exhibit) for <a href="http://www.nines.org">NINES</a> I was approached by <a href="http://www.ibiblio.org/bess/">Bess Sadler</a> asking about the viability of using Solr for searching and faceting on &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve long had a passion for improving the findability within libraries.  The richness of the cultural artifacts that one can find with a bit of foraging astonishes the imagination.  I had the pleasure of working with the <a href="http://patacriticism.org">Applied Research in Patacriticism</a> group at the University of Virginia.  While building the first version of <a href="http://www.collex.org/">Collex</a> (collect/exhibit) for <a href="http://www.nines.org">NINES</a> I was approached by <a href="http://www.ibiblio.org/bess/">Bess Sadler</a> asking about the viability of using Solr for searching and faceting on library data.  The library world was just seeing scalable faceting take stage with NCSU&#8217;s Endeca installation, but the price tag prohibited most every other institution from enjoying faceting.  With the prodding from Bess, I learned a bit about MARC, created some Ruby scripts, invented Solr Flare, and was able to pretty much match what NCSU was doing with only a handful of evenings of hacking.   I presented my work with an all-day preconference class on Solr and <a href="http://code4lib.org/2007/hatcher">keynote at the 2007 code4lib conference</a>.   A lot of things have happened since, and in a large part because of, this initial work.  Solr Flare spun off into <a href="http://projectblacklight.org/">Blacklight</a>, a Ruby on Rails front-end being used by <a href="http://lib.stanford.edu/searchworks/">Stanford&#8217;s SearchWorks</a> effort, <a href="http://virgobeta.lib.virginia.edu/">UVa&#8217;s &#8220;VIRGO Beta&#8221;</a>, and a number of other institutions.  VUFind, a PHP-based front-end, is also a popular front-end technology, and there are several other OPACs (online public access catalog, fancy name for &#8220;website with a search box&#8221;) that reside on top of Solr.<span id="more-1453"></span></p>
<p>VUFind and Blacklight both share a common indexer,<a href="http://code.google.com/p/solrmarc/"> SolrMarc</a>.  SolrMarc provides a flexible, extensible tool for mapping the complex standard library MARC format into Solr documents.</p>
<p>Recently it was reported that that SolrMarc indexing performance needed help (<a href="http://groups.google.com/group/solrmarc-tech/browse_thread/thread/fe329385bb1dc953">Stanford reported 12 hours to index 6M records</a>).  I couldn&#8217;t help but want to help.  So I grabbed the latest SolrMarc (version 2.1, in development), a <a href="http://www.archive.org/details/talis_openlibrary_contribution">publicly available MARC file containing 5.7M records</a>, and gave it a try.  First I ran SolrMarc against the file, and I killed the job after 9 hours.  Rather than looking too deep into the code to see what might be wrong, I decided to get a baseline on how fast indexing MARC could be using the simplest thing that could possibly work.  I created a custom MarcEntityProcessor, a hook into Solr&#8217;s DataImportHandler.  Using the MARC4J library directly, only indexing the id and a toString() of the entire MARC record, I was able to index the same dataset in 55 minutes. <a href="http://groups.google.com/group/solrmarc-tech/browse_thread/thread/b9ba2ed86f5da979"> It went from 22 docs/s to 1,745 docs/s! </a> To be fair, my implementation didn&#8217;t do the fancy mappings needed in the real world, and there is still a bit of work to do in order to fully flesh out a DataImportHandler refactoring, but hopefully this new approach will be embraced by the library Solr community.</p>
<p>This was a long-winded way of saying&#8230; I&#8217;m devoting a chunk of my next couple of months to the needs of the Solr using library community.  My favorite conference of all time is coming up,<a href="http://code4lib.org/conference/2010/"> code4lib conference</a>, and I&#8217;m getting ramped up.   Naomi Dushay (of Stanford) and I are leading a <a href="http://code4lib.org/conference/2010/schedule#preconf">Solr Blackbelt preconference</a> where we&#8217;ll be going through heavy topics like query parsing and improving relevancy.</p>
<p>Stay tuned for lots more light being shed on Solr in the Library!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/01/08/lucid-library-love/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

