<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; Mark Miller</title>
	<atom:link href="http://www.lucidimagination.com/blog/author/markmiller/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>SolrCloud is Coming (and looking to mix in even more &#8216;NoSQL&#8217;)</title>
		<link>http://www.lucidimagination.com/blog/2012/01/23/solrcloud-is-coming-and-looking-to-mix-in-even-more-nosql/</link>
		<comments>http://www.lucidimagination.com/blog/2012/01/23/solrcloud-is-coming-and-looking-to-mix-in-even-more-nosql/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 14:40:19 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4626</guid>
		<description><![CDATA[<p>The second phase of SolrCloud has been in full swing for a couple of months now and it looks like we are going to be able to commit this work to trunk very soon! In Phase1 we built on top of Solr&#8217;s distributed search capabilities and added cluster state, central config, and built-in read side fault tolerance. Phase 2 is even more ambitious and focuses on the write side. We are talking full-blown fault tolerance for reads &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>The second phase of SolrCloud has been in full swing for a couple of months now and it looks like we are going to be able to commit this work to trunk very soon! In Phase1 we built on top of Solr&#8217;s distributed search capabilities and added cluster state, central config, and built-in read side fault tolerance. Phase 2 is even more ambitious and focuses on the write side. We are talking full-blown fault tolerance for reads and writes, near real-time support, real-time GET, true single node durability,  optimistic locking, cluster elasticity, improvements to the Phase 1 features, and more.</p>
<p>Once we get Phase2 into trunk we will work on hardening and finishing a couple missing features &#8211; then SolrCloud should be ready to be part of the upcoming Lucene/Solr 4.0 release.</p>
<p>If you want to read more about SolrCloud and where we are with Phase 2, check out the new wiki page that we are working on at <a href="http://wiki.apache.org/solr/SolrCloud">http://wiki.apache.org/solr/SolrCloud</a> - feedback appreciated!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2012/01/23/solrcloud-is-coming-and-looking-to-mix-in-even-more-nosql/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Lucene/Solr 3.5 Released</title>
		<link>http://www.lucidimagination.com/blog/2011/11/28/lucenesolr-3-5-released/</link>
		<comments>http://www.lucidimagination.com/blog/2011/11/28/lucenesolr-3-5-released/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 14:24:35 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4486</guid>
		<description><![CDATA[<p>Official release announcement for Lucene/Solr 3.5:</p>
<h3><em>November 27 2011,</em> <strong>Apache Lucene™ 3.5.0 available</strong></h3>
<p>&#160;</p>
<p>The Lucene PMC is pleased to announce the release of Apache Lucene 3.5.0.</p>
<p>&#160;</p>
<p>Apache Lucene is a high-performance, full-featured text search engine</p>
<p>library written entirely in Java. It is a technology suitable for nearly</p>
<p>any application that requires full-text search, especially cross-platform.</p>
<p>&#160;</p>
<p>This release contains numerous bug fixes, optimizations, and</p>
<p>improvements, some of which are highlighted below.  The release&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Official release announcement for Lucene/Solr 3.5:</p>
<h3><em>November 27 2011,</em> <strong>Apache Lucene™ 3.5.0 available</strong></h3>
<p>&nbsp;</p>
<p>The Lucene PMC is pleased to announce the release of Apache Lucene 3.5.0.</p>
<p>&nbsp;</p>
<p>Apache Lucene is a high-performance, full-featured text search engine</p>
<p>library written entirely in Java. It is a technology suitable for nearly</p>
<p>any application that requires full-text search, especially cross-platform.</p>
<p>&nbsp;</p>
<p>This release contains numerous bug fixes, optimizations, and</p>
<p>improvements, some of which are highlighted below.  The release</p>
<p>is available for immediate download at:</p>
<p>&nbsp;</p>
<p><a href="http://www.apache.org/dyn/closer.cgi/lucene/java">http://www.apache.org/dyn/closer.cgi/lucene/java</a> (see note below).</p>
<p>&nbsp;</p>
<p>See the CHANGES.txt file included with the release for a full list of</p>
<p>details.</p>
<p>&nbsp;</p>
<p><strong>Lucene 3.5.0 Release Highlights:</strong></p>
<p>&nbsp;</p>
<p>* Added a very substantial (3-5X) RAM reduction required to hold the</p>
<p>terms index on opening an IndexReader. (LUCENE-2205)</p>
<p>&nbsp;</p>
<p>* Added IndexSearcher.searchAfter which returns results after a</p>
<p>specified ScoreDoc (e.g. last document on the previous page) to</p>
<p>support deep paging use cases. (LUCENE-2215)</p>
<p>&nbsp;</p>
<p>* Added SearcherManager to manage sharing and reopening IndexSearchers</p>
<p>across multiple search threads. Underlying IndexReader instances are</p>
<p>safely closed if not referenced anymore. (LUCENE-3445, LUCENE-3558)</p>
<p>&nbsp;</p>
<p>* Added SearcherLifetimeManager which safely provides a consistent</p>
<p>view of the index across multiple requests (e.g. paging/drilldown).</p>
<p>(LUCENE-3558, LUCENE-3486)</p>
<p>&nbsp;</p>
<p>* Renamed IndexWriter.optimize to forceMerge to discourage use of</p>
<p>this method since it is horribly costly and rarely justified</p>
<p>anymore. (LUCENE-3439)</p>
<p>&nbsp;</p>
<p>* Added NGramPhraseQuery that speeds up phrase queries 30-50%</p>
<p>when n-gram analysis is used. (LUCENE-3426)</p>
<p>&nbsp;</p>
<p>* Added a new reopen API (IndexReader.openIfChanged) that</p>
<p>returns null instead of the old reader if there are no changes</p>
<p>in the index. (LUCENE-3464)</p>
<p>&nbsp;</p>
<p>* Improvements to vector highlighting: support for more queries</p>
<p>such as wildcards and boundary analysis for generated snippets</p>
<p>(LUCENE-1824, LUCENE-1889)</p>
<p>&nbsp;</p>
<p>* IndexSearcher and IndexReader now perform additional checks to</p>
<p>throw AlreadyClosedExceptions if searches are performed on a</p>
<p>closed IndexReader. Performing searches on already closed reader</p>
<p>can cause JVM crashes when invalid memory mapped files are</p>
<p>referenced.</p>
<p>&nbsp;</p>
<p>* Several bugfixes, including a bug where closing an NRT reader</p>
<p>after the writer was closed was incorrectly invoking the</p>
<p>DeletionPolicy. See CHANGES.txt entries for full details.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3><em>27 November 2011,</em> <strong>Apache Solr™ 3.5.0 available</strong></h3>
<p>The Lucene PMC is pleased to announce the release of Apache Solr 3.5.0.</p>
<p>&nbsp;</p>
<p>Solr is the popular, blazing fast open source enterprise search platform from</p>
<p>the Apache Lucene project. Its major features include powerful full-text</p>
<p>search, hit highlighting, faceted search, dynamic clustering, database</p>
<p>integration, rich document (e.g., Word, PDF) handling, and geospatial search.</p>
<p>Solr is highly scalable, providing distributed search and index replication,</p>
<p>and it powers the search and navigation features of many of the world&#8217;s</p>
<p>largest internet sites.</p>
<p>&nbsp;</p>
<p>This release contains numerous bug fixes, optimizations, and</p>
<p>improvements, some of which are highlighted below.  The release</p>
<p>is available for immediate download at:</p>
<p><a href="http://www.apache.org/dyn/closer.cgi/lucene/solr">http://www.apache.org/dyn/closer.cgi/lucene/solr</a> (see note below).</p>
<p>&nbsp;</p>
<p>See the CHANGES.txt file included with the release for a full list of</p>
<p>details.</p>
<p>&nbsp;</p>
<p><strong>Solr 3.5.0 Release Highlights:</strong></p>
<p>&nbsp;</p>
<p>* Bug fixes and improvements from Apache Lucene 3.5.0, including a</p>
<p>very substantial (3-5X) RAM reduction required to hold the terms</p>
<p>index on opening an IndexReader. (LUCENE-2205)</p>
<p>&nbsp;</p>
<p>* Added support for distributed result grouping. (SOLR-2066,</p>
<p>SOLR-2776)</p>
<p>&nbsp;</p>
<p>* Added support for Hunspell stemmer TokenFilter supporting stemming</p>
<p>for 99 languages. (SOLR-2769)</p>
<p>&nbsp;</p>
<p>* A new contrib module &#8220;langid&#8221; adds language identification</p>
<p>capabilities as an Update Processor, using Tika&#8217;s</p>
<p>LanguageIdentifier or Cybozu language-detection library (SOLR-1979)</p>
<p>&nbsp;</p>
<p>* Numeric types including Trie and date types now support</p>
<p>sortMissingFirst/Last. (SOLR-2881)</p>
<p>&nbsp;</p>
<p>* Added hl.q parameter. It is optional and if it is specified, it overrides</p>
<p>q parameter in Highlighter. (SOLR-1926)</p>
<p>&nbsp;</p>
<p>* Several minor bugfixes like date parsing for years from 0001-1000, ignored</p>
<p>configurations when using QueryAnalyzer with SpellCheckComponent</p>
<p>and many more.</p>
<p>See CHANGES.txt entries for full details.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Note: The Apache Software Foundation uses an extensive mirroring network for</p>
<p>distributing releases.  It is possible that the mirror you are using may not</p>
<p>have replicated the release yet.  If that is the case, please try another</p>
<p>mirror.  This also goes for Maven access.</p>
<p>&nbsp;</p>
<p>Happy searching,</p>
<p>&nbsp;</p>
<p>Apache Lucene/Solr Developers</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/11/28/lucenesolr-3-5-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NearRealTime Search in Solr 4</title>
		<link>http://www.lucidimagination.com/blog/2011/09/04/nearrealtime-search-in-solr-4/</link>
		<comments>http://www.lucidimagination.com/blog/2011/09/04/nearrealtime-search-in-solr-4/#comments</comments>
		<pubDate>Sun, 04 Sep 2011 15:56:37 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3902</guid>
		<description><![CDATA[<p>Now that NearRealTime search in Solr trunk has had a bit of time to <a href="http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%E2%80%98near-realtime%E2%80%99-improvements/">bake</a>, I&#8217;m starting to document how to take advantage of it on the Solr wiki: <a href="http://wiki.apache.org/solr/NearRealtimeSearch">http://wiki.apache.org/solr/NearRealtimeSearch</a>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Now that NearRealTime search in Solr trunk has had a bit of time to <a href="http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%E2%80%98near-realtime%E2%80%99-improvements/">bake</a>, I&#8217;m starting to document how to take advantage of it on the Solr wiki: <a href="http://wiki.apache.org/solr/NearRealtimeSearch">http://wiki.apache.org/solr/NearRealtimeSearch</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/09/04/nearrealtime-search-in-solr-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running Solr as a Service on Linux</title>
		<link>http://www.lucidimagination.com/blog/2011/08/10/running-solr-as-a-service-on-linux/</link>
		<comments>http://www.lucidimagination.com/blog/2011/08/10/running-solr-as-a-service-on-linux/#comments</comments>
		<pubDate>Wed, 10 Aug 2011 13:23:34 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3824</guid>
		<description><![CDATA[<h1 lang="en-US"><span style="font-family: Helvetica, sans-serif; font-weight: normal; font-size: small;">Let’s install Solr as a service on Linux. I’m using Ubuntu 11.04.</span></h1>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">First download the latest version of Solr from (3.3 as of this writing): <a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/"><span style="color: #000099;"><span style="text-decoration: underline;">http://www.apache.org/dyn/closer.cgi/lucene/solr/</span></span></a></span></span></span></p>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Extract the compressed zip or tgz file to where you would like Solr to live.</span></span></span></p>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Currently, I like using runit to run Linux services. <a href="http://smarden.org/runit/"><span style="color: #000099;"><span style="text-decoration: underline;">http://smarden.org/runit/</span></span></a></span></span></span></p>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Install runit with: <strong>sudo apt-get install runit</strong></span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;"><br />
</span></span></span></p>
<p style="text-align: center;" lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-1.png"><img class="aligncenter size-full wp-image-3825" title="Screenshot-1" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-1.png" alt="" width="465" height="272" /></a></p>
<p style="text-align: center;" lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Create a new service directory.</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-2.png"><img class="size-full wp-image-3826 alignleft" title="Screenshot-2" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-2.png" alt="" width="317" height="72" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&#160;</p>
<p lang="en-US">&#160;</p>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="font-family: Helvetica, sans-serif; font-size: small;">Create a new shell </span>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<h1 lang="en-US"><span style="font-family: Helvetica, sans-serif; font-weight: normal; font-size: small;">Let’s install Solr as a service on Linux. I’m using Ubuntu 11.04.</span></h1>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">First download the latest version of Solr from (3.3 as of this writing): <a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/"><span style="color: #000099;"><span style="text-decoration: underline;">http://www.apache.org/dyn/closer.cgi/lucene/solr/</span></span></a></span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Extract the compressed zip or tgz file to where you would like Solr to live.</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Currently, I like using runit to run Linux services. <a href="http://smarden.org/runit/"><span style="color: #000099;"><span style="text-decoration: underline;">http://smarden.org/runit/</span></span></a></span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Install runit with: <strong>sudo apt-get install runit</strong></span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;"><br />
</span></span></span></p>
<p style="text-align: center;" lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-1.png"><img class="aligncenter size-full wp-image-3825" title="Screenshot-1" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-1.png" alt="" width="465" height="272" /></a></p>
<p style="text-align: center;" lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Create a new service directory.</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-2.png"><img class="size-full wp-image-3826 alignleft" title="Screenshot-2" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-2.png" alt="" width="317" height="72" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="font-family: Helvetica, sans-serif; font-size: small;">Create a new shell script called run in the new /etc/sv/solr directory. You will need to have root permission to work in these directories, so use sudo. In this case, I want to run Solr as the user ‘mark’.</span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-8.png"><img class="size-full wp-image-3832 alignleft" title="Screenshot-8" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-8.png" alt="" width="326" height="75" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="font-family: Helvetica, sans-serif; font-size: small;"><br />
Make the run script executable.</span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-4.png"><img class="size-full wp-image-3828 alignleft" title="Screenshot-4" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-4.png" alt="" width="372" height="23" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Let runit know about the new service.</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-5.png"><img class="size-full wp-image-3829 alignleft" title="Screenshot-5" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-5.png" alt="" width="480" height="14" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Now Solr should be up and running. If it dies or you kill it, it will automatically be restarted. If the server is restarted, Solr will be launched on startup.</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">To stop the service: <strong>sudo sv stop solr</strong></span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">To start the service: <strong>sudo sv start solr</strong></span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Great.</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<h2 lang="en-US">Logging</h2>
<p>&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">By default, Solr logs to STD ERROR. You likely want to add a log configuration file to have the most control over how Solr logs &#8211; see http://wiki.apache.org/solr/LoggingInDefaultJettySetup. To be lazy though (and perhaps safe), let’s make sure STD OUT and STD ERR are nicely logged for us by runit.</span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">This method just logs STD OUT, so lets first edit our Solr run script to redirect STD ERR to STD OUT</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot10.png"><img class="size-full wp-image-3834 alignleft" title="Screenshot10" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot10.png" alt="" width="404" height="75" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Now create a new directory called log in the /etc/sv/solr service directory. Inside this, create another script called run. This script will start the log service, run it under the user mark, and put the log files in the log directory we just made (we use . for the current working directory).</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot11.png"><img class="size-full wp-image-3835 alignleft" title="Screenshot11" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot11.png" alt="" width="253" height="36" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">As we are running as mark, change the owner of the log dir to mark so that the log files can be created: <strong>sudo chown mark log</strong></span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;"><strong> </strong>Now make the new run script executable.</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot12.png"><img class="size-full wp-image-3836 alignleft" title="Screenshot12" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot12.png" alt="" width="370" height="21" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">The next time runit starts, Solr logs will be logged to the /etc/sv/solr/log/current file and auto rolled for you.</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/08/10/running-solr-as-a-service-on-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Benchmarking the new Solr ‘Near Realtime’ Improvements.</title>
		<link>http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%e2%80%98near-realtime%e2%80%99-improvements/</link>
		<comments>http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%e2%80%98near-realtime%e2%80%99-improvements/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 12:50:24 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3743</guid>
		<description><![CDATA[<p>I’ve been working on integrating Solr into the Lucene benchmark module, and I’ve gotten the code to the point of being able to run some decent Solr NRT tests. I recently worked on re-architecting the Solr UpdateHandler as well, and I’m keen to look more deeply at some of the results of those changes. The updates to the UpdateHandler provided a series of benefits, most of which significantly improve Solr’s ability to do NRT without &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>I’ve been working on integrating Solr into the Lucene benchmark module, and I’ve gotten the code to the point of being able to run some decent Solr NRT tests. I recently worked on re-architecting the Solr UpdateHandler as well, and I’m keen to look more deeply at some of the results of those changes. The updates to the UpdateHandler provided a series of benefits, most of which significantly improve Solr’s ability to do NRT without using clever (and usually complicating) workarounds. I likely still have some things to check, some I’s to dot and T’s to cross, but I thought I’d share an early look of my investigation into performance changes.</p>
<p>To see how the recent changes have affected Solr’s performance, I decided to compare the most recent version of Solr trunk with a version from right before the UpdateHandler changes went into Solr trunk. I took the algorithm for NRT testing in Lucene and started tweaking it for use with Solr. I used my Intel Core 2 Quad @ 2.66 GHz for my test. It’s getting old, but it can still move a few bits. The results of this investigation follow.</p>
<p><strong>The Process</strong></p>
<p>So the first thing I did was check out the version of Solr that I wanted (revision 1141518) as well as the latest version of Solr trunk (at the time, revision 1144942). I then applied my Solr benchmark patch to each checkout and put together the following benchmark algorithm:</p>
<pre>{</pre>
<pre> StartSolrServer</pre>
<pre> SolrClearIndex</pre>
<pre> [ "PreLoad" { SolrAddDoc &gt; : 50000] : 4</pre>
<pre> SolrCommit</pre>
<pre> Wait(220)</pre>
<pre> [ "WarmupSearches" { SolrSearch &gt; : 4 ] : 1</pre>
<pre> # Get a new near-real-time reader, sequentially as fast as possible:</pre>
<pre> [ "UpdateIndexView" { SolrCommit &gt; : *] : 1 &amp;</pre>
<pre> # Index with 2 threads, each adding 100 docs per sec</pre>
<pre> [ "Indexing" { SolrAddDoc &gt; : * : 100/sec ] : 2 &amp;</pre>
<pre> # Redline search (from queries.txt) with 4 threads</pre>
<pre> [ "Searching" { SolrSearch &gt; : * ] : 4 &amp;</pre>
<pre> # Wait 60 sec, then wrap up</pre>
<pre> Wait(60)</pre>
<pre>}</pre>
<pre>StopSolrServer</pre>
<pre>RepSumByPref Indexing</pre>
<pre>RepSumByPref Searching</pre>
<pre>RepSumByPref UpdateIndexView</pre>
<p>&nbsp;</p>
<p>This algorithm will start up the Solr example server (I’m using out of the box settings for this test), clear the current index, and then load 200,000 wikipedia docs into the index. Not a large index by any stretch, but it will help let us see the time affects due to various commit and merge activities well enough to make some simple judgements. After committing, the algorithm then waits 220 seconds &#8211; this is required on the latest trunk  version because commits no longer wait for background merges to complete &#8211; so we wait long enough for those merges to complete and not interfere with the benchmark. This is not necessary on the older version &#8211; that commit call will wait until the background merges are finished to return.</p>
<p>Next we do 4 searches to warm up the index just a bit before starting a background thread that will continuously call commit sequentially, as fast as possible. Then we start two more background threads, each adding wikipedia documents at a target rate of 100 docs per second. Then we start 4 background threads that each query Solr as fast as possible. We continue this barrage for a minute and then look at the results.</p>
<p>&nbsp;</p>
<p><strong>The Before Picture</strong></p>
<p><strong><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/07/image1.jpg"><img class="aligncenter size-full wp-image-3745" title="image1" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/07/image1.jpg" alt="" width="409" height="228" /></a><br />
</strong></p>
<p>&nbsp;</p>
<p>This is a graph of the “refresh” times &#8211; the time it took to perform each commit and open up a new view on the index. In this case, the index was refreshed 400 times in the minute we allowed the benchmark to run for. For the most part, the refresh time really does not look too bad. The average “refresh” time is actually just 150ms. Now that Lucene and Solr work mostly per segment, this process can naturally be pretty fast. And this is a pretty small index really. There is a troubling spike in this minute though &#8211; one “refresh” time took about 23 seconds! The reason for this is that the commit triggered background merges, and Solr waited for those background merges to finish before opening a new IndexSearcher and releasing the commit lock. It gets worse though &#8211; not only was the refresh time hurt, but while that commit lock was held, neither of our 2 indexing threads could get a document into the index! They were effectively stalled. Over that minute, we were only able to index the wikipedia documents at 13.91 documents per second. Far below our target hopes of 100 documents per second for each thread! Also, there was a very large block of time were no indexing happened at all. Less troubling, our 4 threads were able to query at a rate of 11.24 queries per second (this can likely vary wildly depending on the ‘challenge’ of the queries.txt file) [UPDATE 9/4/2011: the search rate is very low due to a problem with the initial benchmark - many queries ended up malformed - without so many errors, search performance jumps drastically]. But overall, this is not an optimal use of this desktop’s resources.</p>
<p>&nbsp;</p>
<p><strong>The After Picture</strong></p>
<p>&nbsp;</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/07/image2.jpg"><img class="aligncenter size-full wp-image-3746" title="image2" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/07/image2.jpg" alt="" width="404" height="228" /></a></p>
<p>Now we try with the new UpdateHandler. The new UpdateHandler no longer blocks updates while a commit is in progress. Nor does it wait for background merges to complete before opening a new IndexSearcher and returning.</p>
<p>The results are not bad &#8211; a low average refresh time of 116.74 ms, but also no 23 second spike. There are still spikes, but they are not too frequent, and stay below 2.5 seconds at worst. Micro spikes.</p>
<p>Even better though, our indexing rate is now 125.48 documents per second (vs 13.91 before). This is a fantastic increase &#8211; and likely absent large gaps of no indexing activity. The search performance dropped to 2.8 queries per second (from 11.24), but no doubt this is largely because of all the additional indexing activity that was able to take place. There was a lot more work which the CPUs could now do that they couldn&#8217;t before; since the indexing threads soaked up more CPU resources, queries were allocated fewer resources.</p>
<p><strong>The After Picture With Lucene NRT</strong></p>
<p><strong><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/07/image3.jpg"><img class="aligncenter size-full wp-image-3747" title="image3" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/07/image3.jpg" alt="" width="409" height="231" /></a><br />
</strong></p>
<p>&nbsp;</p>
<p>While I was changing around the UpdateHandler, a simple natural extension was too allow the use of Lucene’s NRT feature when opening new views of the index. This feature allows you to skip certain steps that a full commit performs. The tradeoff is that nothing is guaranteed to be on stable storage, but the benefit is very fast “refresh” times.</p>
<p>To take advantage of this in Solr, we added a new concept that I called a ‘soft’ commit. A soft commit returns quickly, but does not commit documents durably to stable storage. You must also occasionally call ‘hard’ commits to commit to stable storage. A ‘soft’ commit will refresh a SolrCore’s view of the index however.</p>
<p>You can now also setup a ‘soft’ auto-commit in solrconfig.xml. So you might, for example, set up a soft commit to occur every second or so, and a standard commit to occur every 5 minutes.</p>
<p>To test the new ‘soft’ commit feature, I changed the background commit line in the algorithm to:</p>
<p>[ "UpdateIndexView" { SolrCommit(soft) &gt; : *] : 1 &amp;</p>
<p>and ran the benchmark again.</p>
<p>Looking at the graph, it looks like the micro spikes are perhaps a little less frequent &#8211; more importantly though, the average “refresh” time has dropped from about 117ms to just 49ms. In the old case we were able to refresh the view 6.67 times in a minute &#8211; in the new case without soft commit it was 8.56 times per minute &#8211; and in the new case *with* soft commit, we were able to refresh the index view 22.18 times in a minute. We were also able to still index at 130 documents per second while running 3.64 queries per second.</p>
<p><strong>One More Benchmark</strong></p>
<p>Let’s try one more interesting benchmark. We tried to index at 200 documents per second in the previous tests &#8211; which my poor machine could not deliver on. So really, we just maxed out on indexing. In this test, I set the indexing rate to something achievable, rather than something that completely swamps the cpu &#8211; and looked at the results. This benchmark again used ‘soft’ commits.</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/07/image4.jpg"><img class="aligncenter size-full wp-image-3748" title="image4" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/07/image4.jpg" alt="" width="448" height="247" /></a></p>
<p>Thing’s are looking pretty nice &#8211; an average of only 7.5 ms per refresh. That is a refresh rate of 132.6 times per second. Queries per second have also risen to nearly 6 per second from a little over 3 and half. This is an interesting result, and shows that there is still some interesting investigating to do with various settings and algorithm changes.</p>
<p>&nbsp;</p>
<p><strong>Conclusion</strong></p>
<p>It looks like NRT in Solr has taken a nice step forward and these improvements will be available in Solr 4.0. There is still more to do though &#8211; certain features, such as faceting and function queries <span style="color: #ff0000;">(edit: when you use ord)</span>, do not all yet work per segment. This means that using them can require more time than you might like when ‘refreshing’ the index view. Eventually we hope to improve most of those cases even further &#8211; but even when using those features,  in many cases, these changes will still allow you to significantly decrease your indexing to search time latency without resorting to clever tricks like juggling SolrCore’s.</p>
<p>Hopefully this was a fun glimpse at some of the improvements. There is a lot left to look into.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-%e2%80%98near-realtime%e2%80%99-improvements/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Solr Dev Diary: Solr and Near Real-Time Search</title>
		<link>http://www.lucidimagination.com/blog/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/</link>
		<comments>http://www.lucidimagination.com/blog/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/#comments</comments>
		<pubDate>Sat, 09 Apr 2011 18:07:18 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3341</guid>
		<description><![CDATA[<p>Solr’s UpdateHandler has gotten a little crusty. Many of the implementation details are there due to old,  tired,  and removed requirements and functions. For those that do not know, documents that you add to Solr are actually put into the index by the UpdateHandler.</p>
<p>There are two details about the current UpdateHandler implementation that are particularly limiting.</p>
<p>First, Solr uses it’s own lock’s on top of Lucene, adding a courser, unnecessary layer of locking on &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px} p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; color: #270ca7} span.s1 {letter-spacing: 0.0px} span.s2 {letter-spacing: 0.0px color: #000000} span.s3 {text-decoration: underline ; letter-spacing: 0.0px} -->Solr’s UpdateHandler has gotten a little crusty. Many of the implementation details are there due to old,  tired,  and removed requirements and functions. For those that do not know, documents that you add to Solr are actually put into the index by the UpdateHandler.</p>
<p>There are two details about the current UpdateHandler implementation that are particularly limiting.</p>
<p>First, Solr uses it’s own lock’s on top of Lucene, adding a courser, unnecessary layer of locking on top of the IndexWriter. These locks had a reason to exist once upon a time, but really, they no longer do. There is no reason to block additional document adds while performing a commit, but currently this is what Solr does. Removing these locks will reduce complexity and maintenance costs by allowing us to ‘mostly’ just use Lucene’s locking.  Solr will also more easily simply inherit improvements from Lucene in this area.</p>
<p>Second, because of historical requirements, Solr will close and open a new IndexWriter on every commit. This means that every commit waits for all background Index merging threads to finish merging. This can be a non insignificant amount of time &#8211; and during this time you cannot add any documents to the index. You also cannot see the documents that have just been added to the index until the merges and commit are complete. Really, the UpdateHandler should simply commit and open a new SolrIndexSearcher &#8211; with the background threads happily merging *in the background*.</p>
<p>There are a few other things that bug me as well.</p>
<p>Well I’m going to fix them all now. Time to remove the crust and introduce Lucene near-real-time support to Solr. You should be able to open a new view on recently added content with Solr in a fraction of the time possible right now. It’s not right that you have to juggle SolrCore’s to attempt near real time index updates &#8211; it’s time to make things easier. Time to makes things faster.</p>
<p>And when Lucene finishes it’s real-time support and stops IndexWriter flushes from blocking document additions, Solr will be even more ready to take advantage where it can. There will still be more to do &#8211; not everything Solr does is yet per segment, and replication is not currently very near-real-time friendly &#8211; but we will keeping moving things in the right direction.</p>
<p>I’m tackling these changes here: <a href="https://issues.apache.org/jira/browse/SOLR-2193">https://issues.apache.org/jira/browse/SOLR-2193</a></p>
<p>- Mark</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/04/09/solr-dev-diary-solr-and-near-real-time-search/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Garbage Collection Bootcamp 1.0</title>
		<link>http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/</link>
		<comments>http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/#comments</comments>
		<pubDate>Sun, 27 Mar 2011 18:01:32 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3115</guid>
		<description><![CDATA[<h3>Table Of Contents</h3>
<ul>
<li><a href="#whatisgc">What is Garbage Collection</a></li>
<li><a href="#whatisgc"></a><a href="#tuninggc">Tuning Garbage Collection</a></li>
<li><a href="#tuninggc"></a><a href="#thecollectors">The Garbage Collectors</a></li>
<li><a href="#thecollectors"></a><a href="#choosingacollector">Choosing a Collector</a></li>
</ul>
<p><a name="whatisgc"></a></p>
<h2 style="padding-top: 10px;">What is Garbage Collection</h2>
<p style="text-align: left;">Garbage collection in Java is the processes of freeing the dynamic memory used by <a href="http://en.wikipedia.org/wiki/Object_(computer_science)">objects</a> that are no longer being used by an application. In languages such as or C or C++, the developer is often responsible for managing dynamic memory (using <a href="http://en.wikipedia.org/wiki/Malloc">malloc</a> and free or <a href="http://en.wikipedia.org/wiki/New_(C%2B%2B)">new</a> and <a href="http://en.wikipedia.org/wiki/Delete_(C%2B%2B)">delete</a>). However, in Java, this task is left &#8230;</p>]]></description>
			<content:encoded><![CDATA[<h3>Table Of Contents</h3>
<ul>
<li><a href="#whatisgc">What is Garbage Collection</a></li>
<li><a href="#whatisgc"></a><a href="#tuninggc">Tuning Garbage Collection</a></li>
<li><a href="#tuninggc"></a><a href="#thecollectors">The Garbage Collectors</a></li>
<li><a href="#thecollectors"></a><a href="#choosingacollector">Choosing a Collector</a></li>
</ul>
<p><a name="whatisgc"></a></p>
<h2 style="padding-top: 10px;">What is Garbage Collection</h2>
<p style="text-align: left;">Garbage collection in Java is the processes of freeing the dynamic memory used by <a href="http://en.wikipedia.org/wiki/Object_(computer_science)">objects</a> that are no longer being used by an application. In languages such as or C or C++, the developer is often responsible for managing dynamic memory (using <a href="http://en.wikipedia.org/wiki/Malloc">malloc</a> and free or <a href="http://en.wikipedia.org/wiki/New_(C%2B%2B)">new</a> and <a href="http://en.wikipedia.org/wiki/Delete_(C%2B%2B)">delete</a>). However, in Java, this task is left up to something known as the garbage collector. A garbage collector automatically frees unused memory, freeing the developer from much of this thankless memory juggling.</p>
<p style="text-align: left;">The most basic garbage collection algorithm works by starting at the root objects (ie objects on the thread stack, static objects, etc) that are live (live meaning currently in use) &#8211; and then iterating down over every reachable object. Any object that cannot be reached in this manner is garbage and can be collected. The application is paused while this process goes on. This is referred to as mark and sweep – first you mark the objects that are live, then you sweep those that are not. The time needed to do this is obviously proportional to the number of live objects (which can be quite a large number in modern Java applications), and so more efficient collection schemes have been devised.</p>
<p style="text-align: center;"><img class="size-full wp-image-1097   aligncenter" title="Heap Spaces" src="http://www.lucidimagination.com/blog/wp-content/uploads/2009/09/gc-spaces1.png" alt="gc-spaces" width="281" height="256" /></p>
<p style="text-align: left;">One such scheme comes from the natural fact that you can divide up objects based on how long they live. Most applications create a lot of very short lived objects, and fewer objects that are around for a long time (I&#8217;ve seen estimates that for the average application, 85-98% of allocated objects are short lived). You can take advantage of this fact when doing collections. In Java, objects are allocated from a region of memory known as the <a href="http://en.wikipedia.org/wiki/Dynamic_memory_allocation">heap</a>. The Java heap is generally divided up into a few spaces (its usually the same across implementations, but there is the odd exception or two). The major spaces are the young generation, the tenured generation (also called the old generation), and the permanent generation. The young generation is then further sub divided into the eden space and two survivor spaces. The permanent generation is generally for objects that are around for the life of the application (interned Strings, class objects, etc) and doesn&#8217;t usually play much of a role in garbage collection. The permanent generation size is not part of the heap region defined with -Xms and -Xmx. Though a very unusual need, it is still worth noting that the permanent generation can actually be collected if needed using:<span style="color: #0000ff;"> </span></p>
<pre>-XX:+CMSPermGenSweepingEnabled</pre>
<p style="text-align: left;"><span id="more-3115"></span>When objects are first created, they are allocated within the eden space. When the eden space becomes full, the still live objects within it are copied into one of the survivor spaces (or if they don&#8217;t fit, into the tenured space). One survivor space is always left empty, and on each young generation collection (a minor collection), the live objects from the eden space and the non empty survivor space are copied into the empty survivor space.  This leaves a newly emptied survivor space for the next round, as any still live objects in the formerly full survivor space will be copied into the tenured space.</p>
<p style="text-align: left;">As you can see, rather then running over every object  for every collection now, you can collect the young generations more often, and the tenured generation (long lived objects), much less often. You can also optimize your collection for the characteristics of the space – ie usually, almost all of the objects  in the young space will be garbage. In general, an object will have to survive a couple minor collections to make it to the tenured space (first making it into a survivor space and then the tenured space). A copying collector identifies garbage by copying live objects from one space to another &#8211; anything left over is by definition garbage. The Sun JDK uses copying collectors for the young space and mark and sweep type collectors for the tenured space.</p>
<p style="text-align: left;">
<p><a name="tuninggc"></a></p>
<h2 style="padding-top: 10px;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301246465_length-measure.png"><img class="alignleft size-full wp-image-3204" title="1301246465_length-measure" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301246465_length-measure.png" alt="" width="24" height="24" /></a>Tuning Garbage Collection</h2>
<p style="text-align: left;">Tuning for garbage collection means adjusting the sizes of the various spaces mentioned in the previous section, as well as the algorithms used to collect them. You can do this with various JVM command line options.</p>
<p style="text-align: left;">The amount of RAM available for the various spaces is dependent upon the size of the heap that the JVM has allocated. Defaults are chosen based on the hardware detected, but you can usually do better by specifying a good Xms, Xmx yourself. On a server machine, it can be a good idea to pin those two settings together so that the JVM does not waste any time resizing itself. You generally do not want to size the heap much larger than is needed &#8211; this can needlessly increase the cost of full garbage collections, and take RAM from other important activities, such as file system caching.</p>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>
<pre>-Xms</pre>
</td>
<td>Initial Heap Size</td>
</tr>
<tr>
<td>
<pre>-Xmx</pre>
</td>
<td>Maximum Heap Size</td>
</tr>
</tbody>
</table>
<p><strong>A Note About JVM Cmd Line Options</strong></p>
<ul>
<li>Boolean options &#8211;   <strong>On</strong>: <code>-XX:+&lt;option&gt;</code> <strong>Off</strong>: <code>-XX:-&lt;option&gt;</code>.</li>
<li>Numeric options:  <code>-XX:&lt;option&gt;=&lt;number&gt;</code>. Numbers can include &#8216;m&#8217; or &#8216;M&#8217; for megabytes, &#8216;k&#8217; or &#8216;K&#8217; for kilobytes, and &#8216;g&#8217; or &#8216;G&#8217; for gigabytes (1M= 1048576). In the case of Xms and Xmx, only one X is used and no colon.</li>
<li>String options: <code>-XX:&lt;option&gt;=&lt;string&gt; </code></li>
</ul>
<h4 style="padding-top: 16px;"><strong><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245152_layer-resize.png"><img class="alignleft size-full wp-image-3189" title="1301245152_layer-resize" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245152_layer-resize.png" alt="" width="16" height="16" /></a> Sizing the individual spaces</strong></h4>
<p style="text-align: left;">You usually want to grant plenty of memory to the young generation – especially when you have multiple processors – as allocation can be parallelized and each thread will get its own private piece of the eden space to work with. You generally want the young generation to have less than half the space of the tenured generation though – especially when using the Serialized collector. About 33% is usually a good number to start from. The best size will vary from application to application depending on its distribution of young vs long lived objects. You don&#8217;t want the young space to be so small that many short lived objects are getting piled into the tenured space. You also usually don&#8217;t want it to be so large that the tenured space doesn&#8217;t have enough space available to it and/or young generation collections start taking too long to complete.</p>
<p style="text-align: left;">Other than sizing the total heap, sizing the new generation (another name for the young generation) can be the most important piece to good performance.</p>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>
<pre>-XX:NewSize</pre>
</td>
<td>(Since 5.0) Size of the young generation at JVM startup – this is calculated automatically if you specify NewRatio</td>
</tr>
<tr>
<td>
<pre>-XX:MaxNewSize</pre>
</td>
<td>(Since 1.4) The largest size the young generation can grow to (unlimited if not specified)</td>
</tr>
<tr>
<td>
<pre>-Xmn</pre>
</td>
<td>Sets the new generation to a fixed size &#8211; this is not usually recommended unless you are fixing the other memory sizes as well.</td>
</tr>
<tr>
<td>
<pre>-XX:NewRatio</pre>
</td>
<td>Sets the new generation size as a ratio to the tenured generation size.</td>
</tr>
<tr>
<td>
<pre>-XX:SurvivorRatio</pre>
</td>
<td>You can also control the sizing of the survivor spaces – in practice this is not usually very helpful though.</td>
</tr>
</tbody>
</table>
<p style="text-align: left;">The best sizing is usually chosen by playing with the parameters and then testing the performance of your application. Often, the JVM uses good defaults, or depending on the garbage collector in use, resizes the spaces on it&#8217;s own based on historical statistics.</p>
<p style="text-align: left;">There are a few helpful tools that give you insight into the garbage collection process.</p>
<h4 style="padding-top: 8px;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245064_gnome-eyes.png"><img class="alignleft size-full wp-image-3186" title="1301245064_gnome-eyes" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245064_gnome-eyes.png" alt="" width="24" height="24" /></a> Getting a View into Garbage Collection</h4>
<p style="text-align: left;">You can use the following command line options to generate information about the garbage collection process:</p>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>
<pre>-verbose:gc</pre>
</td>
<td>Print info about heap and gc on each collection.</td>
</tr>
<tr>
<td>
<pre>-XX:+PrintGCDetails</pre>
</td>
<td>(Since 1.4) Print additional garbage collection info.</td>
</tr>
<tr>
<td>
<pre>-XX:+PrintGCTimeStamps</pre>
</td>
<td>(Since 1.4) Add timestamps to the garbage collection logs.</td>
</tr>
<tr>
<td>
<pre>-Xloggc:C:\whereever\gc.log</pre>
</td>
<td>Specify log file.</td>
</tr>
</tbody>
</table>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245296_package_development.png"><img class="alignleft size-full wp-image-3191" title="1301245296_package_development" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245296_package_development.png" alt="" width="16" height="16" /></a> There are various tools to then help you decipher these logs. One is <a href="http://www.tagtraum.com/gcviewer.html">GCViewer</a> &#8211; though it only knows how to read gc logs up to Java 5.0 (though it can partially read 6.0 files). Another nice option from IBM is <a href="http://www.alphaworks.ibm.com/tech/pmat">PMAT</a>, and it can read Java 6 gc logs.</p>
<p style="text-align: left;">There is also a very cool tool called <a href="http://java.sun.com/performance/jvmstat/visualgc.html">VisualGC</a> that you can use to visually watch how objects move between spaces in real time as your application is running. This is available as a standalone application, or as a plugin for both <a href="http://netbeans.org/">Netbeans</a> and <a href="http://visualvm.java.net/">VisualVM</a>.</p>
<p style="text-align: left;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/visualgc.jpg"><img class="aligncenter size-full wp-image-3172" title="visualgc" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/visualgc.jpg" alt="" width="200" height="137" /></a></p>
<p style="text-align: left;">
<p style="text-align: left;">
<p><a name="thecollectors"></a></p>
<h2 style="padding-top: 10px;">The Garbage Collectors</h2>
<p style="text-align: left;"><em>The following applies to the Sun Java implementation as well as OpenJDK.</em></p>
<p style="text-align: left;">There are three main garbage collection schemes that you should concern yourself with (much of this applies to Java 1.4, but in general, I am targeting Java 1.5 and up). These schemes are often called collectors themselves, but generally each involves two collectors &#8211; one for the old space and one for the new space. These collector schemes are often referred to by their old space collector names: <strong>the Serialized Collector</strong>, <strong>the Throughput Collector</strong>, and <strong>the Concurrent Low Pause Collector</strong>.</p>
<p style="text-align: left;">There is also an older incremental collector (unsupported and also called the train collector), and an incremental collection mode for the concurrent low pause collector (that I touch on and is generally used when only one or two CPU&#8217;s are available), but I&#8217;ll leave those for you to explore on your own if you are interested.</p>
<p style="text-align: left;">
<h4 style="padding-top: 8px;"><strong><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png"><img class="size-full wp-image-3181 alignleft" title="garbage" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png" alt="" width="25" height="25" /></a>The Serialized Collector</strong></h4>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td><span style="color: #0000ff;">Cmd Line Arg</span></td>
<td>
<pre>-XX:+UseSerialGC (Since 5.0)</pre>
</td>
</tr>
<tr>
<td><span style="color: #0000ff;">New Space Collector</span></td>
<td><strong>Serial</strong> &#8211; single threaded, stop the world, copying collector</td>
</tr>
<tr>
<td><span style="color: #0000ff;">Old Space Collector</span></td>
<td><strong>Serial Old</strong> &#8211; single threaded, stop the world, mark-sweep-compact collector</td>
</tr>
</tbody>
</table>
<p style="text-align: left;">With the serialized collector, a major collection is done when the tenured space is full. This is known as a “stop the world” collection, because all application threads will be paused while the collection occurs.</p>
<p style="text-align: left;">This collector is best used with small applications, applications run on a single CPU machine, or applications where pause times don&#8217;t matter. This collector is relatively efficient because it does not need to communicate between threads, but you have to be willing to accept its “stop the world” pauses. Minor collections will &#8220;stop the world&#8221; as well, but are generally fairly efficient and fast.</p>
<p style="text-align: left;">This collector is the only one that I have seen to respect <span style="color: #0000ff;">-XX:MaxHeapFreeRatio </span><span style="color: #0000ff;"><span style="color: #000000;">- though that still only happens if a full collection is triggered. If you where trying to keep your RAM usage to a minimum, and always return as much memory as possible to the operating system, using the serialized collector and an aggressive </span><span style="color: #0000ff;"><span style="color: #000000;">-XX:MaxHeapFreeRatio</span><span style="color: #000000;"> can be a good strategy. You might want to occasionally force a full collection with System.gc() when your application is idle.</span></span></span></p>
<p style="text-align: left;">
<h4 style="padding-top: 8px;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png"><img class="alignleft size-full wp-image-3181" title="garbage" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png" alt="" width="25" height="25" /></a>The Throughput Collector  (also known as the Parallel Collector)</h4>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td><span style="color: #0000ff;">Cmd Line Arg</span></td>
<td>
<pre>-XX:+UseParallelGC (Since 1.4.1)</pre>
</td>
</tr>
<tr>
<td><span style="color: #0000ff;">New Space Collector</span></td>
<td><strong>Parallel Scavenge</strong> &#8211; multi threaded, stop the world, copying collector</td>
</tr>
<tr>
<td><span style="color: #0000ff;">Old Space Collector</span></td>
<td><strong>Serial Old</strong> &#8211; single threaded, stop the world, mark-sweep-compact collector</td>
</tr>
</tbody>
</table>
<p style="text-align: left;">The throughput collector uses a parallel version of the young generation collector, while the tenured generation will still use the serial collector. So while a single thread will still perform collections on the tenured space, multiple threads will work together collecting the young space.</p>
<p style="text-align: left;">A feature called parallel compaction was added in Java 1.5 update 6 – this feature allows the throughput collector to also perform major collections in parallel. You can enable this with<span style="color: #0000ff;"> -XX:+UseParallelOldGC</span>. Using this should help a lot with scalability, as you sidestep the single collection thread bottleneck on very large heaps (multi gigabyte). I&#8217;ve read this can actually lower performance on smaller heaps due to lock contention.</p>
<p style="text-align: left;">The throughput collector should be the default collector chosen on <a href="http://www.oracle.com/technetwork/java/ergo5-140223.html">server class machines</a> (in Java 1.5 and up), but there are exceptions &#8211; for example, my MacbookPro defaults to the CMS collector. You can always override these defaults.</p>
<p style="text-align: left;">Throughput is usually most useful when your application has a large number of threads creating  new objects, and you have more than one processor available (though more than two is best). Typically, when you have multiple threads allocating objects,  you also want to increase the size of the young generation.  The number of garbage collector threads will generally be equal to the number of processors you have, but you can control that number with <span style="color: #0000ff;">-XX:ParallelGCThreads</span>=n. Sometimes you will want to lower the number of threads because each will reserve a part of the tenured generation for promotions – this can cause a fragmentation effect and effectively lower the size of the tenured generation (this is generally only an issue if your application has access to many processors or cores).</p>
<p style="text-align: left;">The throughput collector also supports something called Ergonomics. As part of this support, you can specify various desired behaviors for your application, and the JVM will attempt to tune various settings to meet your goals.</p>
<p style="text-align: left;"><span style="color: #0000ff;">-XX:MaxGCPauseMillis</span>=n  hint to the throughput collector that a max pause time of n milliseconds is desired. By default there is no hint. The collector will adjust the heap size and other collection parameters in an attempt to meet the hint – keep in mind that throughput may be sacrificed in the attempt to meet this goal. There is also no guarantee that the goal will be met.</p>
<p style="text-align: left;">You can also specify a target goal for how much time is spent in garbage collection in comparison to running your application using <span style="color: #0000ff;">-XX:GCTimeRatio</span>. By default this is set to 1% (keep in mind that these defaults tend to change from release to release).</p>
<p style="text-align: left;">With the serialzed garbage collector a generation is collected when it is full (i.e., when no further allocations can be done from that generation). This is also true of the throughput collector.</p>
<p style="text-align: left;">
<h4 style="padding-top: 8px;"><strong><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png"><img class="alignleft size-full wp-image-3181" title="garbage" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png" alt="" width="25" height="25" /></a>The Concurrent Low Pause Collector</strong></h4>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td><span style="color: #0000ff;">Cmd Line Arg</span></td>
<td>
<pre>-XX:+UseConcMarkSweepGC (Since 1.4.1)</pre>
</td>
</tr>
<tr>
<td><span style="color: #0000ff;">New Space Collector</span></td>
<td><strong>Par New</strong> &#8211; multi threaded, stop the world, copying collector that works with CMS</td>
</tr>
<tr>
<td><span style="color: #0000ff;">Old Space Collector</span></td>
<td>Usually <strong>CMS</strong>, the mostly concurrent low pause collector &#8211; unless there is a concurrent mode failure, in which case, <strong>Serial Old </strong>- single threaded, stop the world, mark-sweep-compact collector</td>
</tr>
</tbody>
</table>
<p style="text-align: left;">Use the concurrent low pause collector when you can afford to share the processor resources with the garbage collector while the application is running. This is usually good for an application with a lot of long lived data – meaning you need a large tenured generation space. Obviously, having multiple processors is also helpful. This collector still pauses the application threads twice in a collection – once briefly at the start (when it marks objects directly accessible from root objects), and a slightly longer pause towards the middle (when it sweeps to find what it missed due to parallel marking) – the rest of the collection is done concurrently using one of the available processors (or one thread). If this collector cannot complete collecting the tenured space before it is full, all threads will be paused and a full collection performed – this is known as a concurrent mode failure and likely means you need to adjust the concurrent collection parameters.</p>
<p style="text-align: left;">This collector is used for the tenured generation, and does the collection concurrently with the execution of the application. This collector can also be paired with a parallel version of the young generation collector (<span style="color: #0000ff;">-XX:+UseParNewGC</span>).</p>
<p style="text-align: left;">Note that -<span style="color: #0000ff;">XX:+UseParallelGC</span> (the throughput collector) should not be used with <span style="color: #0000ff;">-XX:+UseConcMarkSweepGC</span>, and the JVM will fail on startup if you try this with most modern JVMs. Same with <span style="color: #0000ff;">-XX:+UseParallelOldGC</span>.</p>
<p style="text-align: left;">The concurrent low pause collector will keep statistics so that it can best guess when to start collecting (so that it finishes before the tenured space is full) – also though, it will start collecting when the tenured space hits a percentage of what&#8217;s available – You can manually set this with <span style="color: #0000ff;">-XX:CMSInitiatingOccupancyFraction</span>=n. The default for this setting varies across JVMs. I&#8217;ve read that the default for 1.5 was 68%, while the default for 1.6 is 92%. You can lower this if needed to ensure that the collection is kicked off sooner, and then you will be more likely to finish the collection before the tenured space is full.</p>
<p style="text-align: left;">The concurrent low pause collector can also be used in an incremental mode that I will not go into here. This mode causes the low pause collector to occasional yield the processor used for parallel collection back to the application, and thereby lessen its impact on application performance.</p>
<p style="text-align: left;">
<h5><strong>The Parallel Young Generation Collector</strong></h5>
<p style="text-align: left;"><span style="color: #0000ff;">-XX:+UseParNewGC</span></p>
<p style="text-align: left;">This collector is much like the throughput collector in that it collects the young generation in parallel. Most of what applies to the throughput collector also applies to this collector, however a different implementation is used that allows this collector to work in conjunction with the concurrent low pause collector, unlike the throughput collector. Despite some Sun/Oracle literature indicating this is off by default, it does seem to be on by default when using CMS in at least Java 6. You can disable it with:</p>
<pre>-XX:+UseConcMarkSweepGC -XX:-UseParNewGC</pre>
<p style="text-align: left;">The flip side of that coin is that while the throughput garbage collector (<span style="color: #0000ff;">-XX:+UseParallelGC</span>) can be used with adaptive sizing (<span style="color: #0000ff;">-XX:+UseAdaptiveSizePolicy</span>), the parallel young generation collector (<span style="color: #0000ff;">-XX:+UseParNewGC</span>) cannot.</p>
<p style="text-align: left;"><span style="color: #0000ff;">-XX:+UseAdaptiveSizePolicy</span> records statistics about GC times, allocation rates, and free space, and then sizes the young and tenured generations to best fit those statistics. This is for use with the throughput collector and is on by default.</p>
<p style="text-align: left;">
<p><a name="choosingacollector"></a></p>
<h2 style="padding-top: 10px;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245231_gnome-color-chooser.png"><img class="alignleft size-full wp-image-3190" title="1301245231_gnome-color-chooser" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245231_gnome-color-chooser.png" alt="" width="16" height="16" /></a> Choosing a Collector</h2>
<p><em>Note: this article is biased towards server applications and the -server hotspot vm.</em></p>
<p>Usually you just want to start with the Parallel (throughput) collector. It&#8217;s the one that has ergonomics, and it will automatically adjust key settings so that most server apps will do just fine. This is the default collector on most server class systems. In general, you do <strong>not</strong> need to change any garbage collection settings until you have determined you have a garbage collection issue to solve.</p>
<p>When you have to confront very large heaps, the Parallel collector can start to break down &#8211; it collects the tenured space using a stop the world collection, meaning your app is frozen while the collections happens. So when you find that the Parallel collector is just not cutting it, even when using <span style="color: #0000ff;">UseParallelOldGC</span>, you might try the mostly Concurrent Low-Pause Collector. It will collect as your application is running using a thread on the side, with two much shorter stop-the-world pauses. Overall, the CMS collector is slower in terms of throughput &#8211; but your application will likely be frozen less often.</p>
<p>Ergonomics do not apply here, so you are on your own for coming up with good settings if the defaults don&#8217;t turn out to be a good fit &#8211; but you can often remove long &#8220;the world is stopped&#8221; pauses with this collector.</p>
<p>The hope is that it is just going to make sense to always use the G1 collector in the future &#8211; it attempts to offer the best of both worlds of the throughput and mostly concurrent low pause collectors.</p>
<h2 style="padding-top: 10px;">The Garbage First (G1) Collector</h2>
<p>The <a href="http://www.google.com/url?sa=t&amp;source=web&amp;cd=3&amp;ved=0CCYQFjAC&amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.63.6386%26rep%3Drep1%26type%3Dpdf&amp;rct=j&amp;q=garbage%20first%20white%20paper&amp;ei=2GaPTZOhNMnB0QHu3-GwCw&amp;usg=AFQjCNFumDknXeOYW1e9yzUpsNCxN3H3oQ&amp;sig2=QVrnASDWxo63FZRHh7x5hg">Garbage First Collector</a> is a new garbage collector that intends to rule them all. It is available in Sun Java 6 update 14 as well as recent versions of OpenJDK6 and early versions of OpenJDK 7. Eventually I plan to write more about his collector. Briefly: the G1 collector should combine the best of both the throughput and mostly concurrent low pause collectors. It uses new strategies to minimize stop the world pauses and maintain high throughput on multiprocessor systems with very large heaps.</p>
<p>Try this collector with:</p>
<pre>-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>[ANNOUNCE] Solr 1.4.1 Released</title>
		<link>http://www.lucidimagination.com/blog/2010/06/28/announce-solr-1-4-1-released/</link>
		<comments>http://www.lucidimagination.com/blog/2010/06/28/announce-solr-1-4-1-released/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 01:23:11 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2193</guid>
		<description><![CDATA[<p>Apache Solr 1.4.1 has been released and is now available for public<br />
download!<br />
<a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/"> http://www.apache.org/dyn/closer.cgi/lucene/solr/</a></p>
<p>Solr is the popular, blazing fast open source enterprise search<br />
platform from the Apache Lucene project.  Its major features include<br />
powerful full-text search, hit highlighting, faceted search, dynamic<br />
clustering, database integration, and rich document (e.g., Word, PDF)<br />
handling.  Solr is highly scalable, providing distributed search and<br />
index replication, and it powers the search and navigation features of<br />
many of the world&#8217;s &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Apache Solr 1.4.1 has been released and is now available for public<br />
download!<br />
<a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/"> http://www.apache.org/dyn/closer.cgi/lucene/solr/</a></p>
<p>Solr is the popular, blazing fast open source enterprise search<br />
platform from the Apache Lucene project.  Its major features include<br />
powerful full-text search, hit highlighting, faceted search, dynamic<br />
clustering, database integration, and rich document (e.g., Word, PDF)<br />
handling.  Solr is highly scalable, providing distributed search and<br />
index replication, and it powers the search and navigation features of<br />
many of the world&#8217;s largest internet sites.</p>
<p>Solr is written in Java and runs as a standalone full-text search server<br />
within a servlet container such as Tomcat.  Solr uses the Lucene Java<br />
search library at its core for full-text indexing and search, and has<br />
REST-like HTTP/XML and JSON APIs that make it easy to use from virtually<br />
any programming language.  Solr&#8217;s powerful external configuration allows<br />
it to be tailored to almost any type of application without Java coding,<br />
and it has an extensive plugin architecture when more advanced<br />
customization is required.</p>
<p>Solr 1.4.1 is a bug fix release for Solr 1.4 that includes many Solr bug<br />
fixes as well as Lucene bug fixes from Lucene 2.9.3.</p>
<p>See all of the CHANGES here:<br />
<a href="http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.1/CHANGES.txt"> http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.1/CHANGES.txt</a></p>
<p>- &#8211; Mark Miller on behalf of the Solr team</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/06/28/announce-solr-1-4-1-released/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>[ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3</title>
		<link>http://www.lucidimagination.com/blog/2010/06/18/announce-release-of-lucene-java-3-0-2-and-2-9-3/</link>
		<comments>http://www.lucidimagination.com/blog/2010/06/18/announce-release-of-lucene-java-3-0-2-and-2-9-3/#comments</comments>
		<pubDate>Fri, 18 Jun 2010 17:01:29 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2188</guid>
		<description><![CDATA[<p>Hello Lucene users,</p>
<p>On behalf of the Lucene development community I would like to announce the<br />
release of Lucene Java versions 3.0.2 and 2.9.3:</p>
<p>Both releases fix bugs in the previous versions:</p>
<p>- 2.9.3 is a bugfix release for the Lucene Java 2.x series, based on Java<br />
1.4.<br />
- 3.0.2 has the same bug fix level but is for the Lucene Java 3.x series,<br />
based on Java 5.</p>
<p>New users of Lucene are advised to &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Hello Lucene users,</p>
<p>On behalf of the Lucene development community I would like to announce the<br />
release of Lucene Java versions 3.0.2 and 2.9.3:</p>
<p>Both releases fix bugs in the previous versions:</p>
<p>- 2.9.3 is a bugfix release for the Lucene Java 2.x series, based on Java<br />
1.4.<br />
- 3.0.2 has the same bug fix level but is for the Lucene Java 3.x series,<br />
based on Java 5.</p>
<p>New users of Lucene are advised to use version 3.0.2 for new developments,<br />
because it has a clean, type-safe API.</p>
<p>Important improvements in these releases include:<br />
- Fixed memory leaks in IndexWriter when large documents are indexed. It<br />
also uses now shared memory pools for term vectors and stored fields.<br />
IndexWriter now releases Fieldables and Readers on close.<br />
- NativeFSLockFactory fixes and improvements. Release write lock if<br />
exception occurs in IndexWriter ctors.<br />
- FieldCacheImpl.getStringIndex() no longer throws an exception when term<br />
count exceeds doc count.<br />
- Improve concurrency of IndexReader, especially in the context of near<br />
real-time readers.<br />
- Near real-time readers, opened while addIndexes* is running, no longer<br />
miss some segments.<br />
- Performance improvements in ParallelMultiSearcher (3.0.2 only).<br />
- IndexSearcher no longer throws NegativeArraySizeException if you pass<br />
Integer.MAX_VALUE as nDocs to search methods.</p>
<p>Both releases are fully compatible with the corresponding previous versions.<br />
We strongly recommend upgrading to 2.9.3 if you are using 2.9.x; and to<br />
3.0.2 if you are using 3.0.x.</p>
<p>See core changes at<br />
<a href="http://lucene.apache.org/java/3_0_2/changes/Changes.html"> http://lucene.apache.org/java/3_0_2/changes/Changes.html</a><br />
<a href="http://lucene.apache.org/java/2_9_3/changes/Changes.html"> http://lucene.apache.org/java/2_9_3/changes/Changes.html</a></p>
<p>and contrib changes at<br />
<a href="http://lucene.apache.org/java/3_0_2/changes/Contrib-Changes.html"> http://lucene.apache.org/java/3_0_2/changes/Contrib-Changes.html</a><br />
<a href="http://lucene.apache.org/java/2_9_3/changes/Contrib-Changes.html"> http://lucene.apache.org/java/2_9_3/changes/Contrib-Changes.html</a></p>
<p>Binary and source distributions are available at<br />
<a href="http://www.apache.org/dyn/closer.cgi/lucene/java/"> http://www.apache.org/dyn/closer.cgi/lucene/java/</a></p>
<p>Lucene artifacts are also available in the Maven2 repository at<br />
<a href="http://repo1.maven.org/maven2/org/apache/lucene/"> http://repo1.maven.org/maven2/org/apache/lucene/</a></p>
<p>&#8212;&#8211;<br />
Uwe Schindler<br />
uschindler@apache.org<br />
Apache Lucene PMC Member / Committer<br />
Bremen, Germany</p>
<p>http://lucene.apache.org/</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/06/18/announce-release-of-lucene-java-3-0-2-and-2-9-3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Lucene and Solr Development Have Merged</title>
		<link>http://www.lucidimagination.com/blog/2010/03/26/lucene-and-solr-development-have-merged/</link>
		<comments>http://www.lucidimagination.com/blog/2010/03/26/lucene-and-solr-development-have-merged/#comments</comments>
		<pubDate>Fri, 26 Mar 2010 23:06:52 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1892</guid>
		<description><![CDATA[<p>The Lucene community has recently decided to merge the development of two of its sub-projects – Lucene-&#62;Java and Lucene-&#62;Solr. Both code bases now sit under the same trunk in svn and Solr actually runs straight off the latest Lucene code at all times. This is just a merge of development though. Release artifacts will remain separate: Lucene will remain a core search engine Java library and Solr will remain a search server built on top &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>The Lucene community has recently decided to merge the development of two of its sub-projects – Lucene-&gt;Java and Lucene-&gt;Solr. Both code bases now sit under the same trunk in svn and Solr actually runs straight off the latest Lucene code at all times. This is just a merge of development though. Release artifacts will remain separate: Lucene will remain a core search engine Java library and Solr will remain a search server built on top of Lucene. From a user perspective, things will be much the same as they were – just better.</p>
<p>So what is with the merge?</p>
<p>Because of the way things worked in the past, even with many overlapping committers, many features that could benefit Lucene have been placed in Solr. They arguably “belonged” in Lucene, but due to dev issues, it benefited Solr to keep certain features that were contributed by Solr devs under Solr&#8217;s control. Moving some of this code to Lucene would mean that some Solr committers would no longer have access to it &#8211; A Solr committer that wrote and committed the code might actually lose the ability to maintain it without the assistance of a Lucene committer – and if Solr wanted to be sure to run off a stable, released version of Lucene, Solr&#8217;s release could be tied to Lucene&#8217;s latest release when some of this code needed to be updated. With Solr planning to update Lucene libs less frequently (due to the complexities of releasing with a development version of Lucene), there would be long waits for bug fixes to be available in Solr trunk.</p>
<p>All and all, there would be both pluses and minuses to refactoring Solr code into Lucene without the merge, but the majority have felt the minuses outweighed the pluses. Attempts at doing this type of thing in the past have failed and resulted in diverging similar code in both code bases. With many committers overlapping both projects, this was a very odd situation. Fix a bug in one place, and then go and look for the same bug in similar, but different code in another place &#8211; perhaps only being able to commit in one of the two spots.</p>
<p>With merged dev, there is now a single set of committers across both projects. Everyone in both communities can now drive releases – so when Solr releases, Lucene will also release – easing concerns about releasing Solr on a development version of Lucene. So now, Solr will always be on the latest trunk version of Lucene and code can be easily shared between projects  – Lucene will likely benefit from Analyzers and QueryParsers that were only available to Solr users in the past. Lucene will also benefit from greater test coverage, as now you can make a single change in Lucene and run tests for both projects – getting immediate feedback on the change by testing an application that extensively uses the Lucene libraries. Both projects will also gain from a wider development community, as this change will foster more cross pollination between Lucene and Solr devs (now just Lucene/Solr devs).</p>
<p>All and all, I think this merge is going to be a big boon for both projects. A tremendous amount of work has already been done to get Solr working with the latest Lucene API&#8217;s and allow for a seamless development experience with Lucene/Solr as a single code base (the Lucene/Solr tests are ridiculously faster than they were as well!). Look for some really fantastic releases from Lucene/Solr in the future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/03/26/lucene-and-solr-development-have-merged/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

