<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; Mark Miller</title>
	<atom:link href="http://www.lucidimagination.com/blog/tag/mark-miller/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Running Solr as a Service on Linux</title>
		<link>http://www.lucidimagination.com/blog/2011/08/10/running-solr-as-a-service-on-linux/</link>
		<comments>http://www.lucidimagination.com/blog/2011/08/10/running-solr-as-a-service-on-linux/#comments</comments>
		<pubDate>Wed, 10 Aug 2011 13:23:34 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3824</guid>
		<description><![CDATA[<h1 lang="en-US"><span style="font-family: Helvetica, sans-serif; font-weight: normal; font-size: small;">Let’s install Solr as a service on Linux. I’m using Ubuntu 11.04.</span></h1>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">First download the latest version of Solr from (3.3 as of this writing): <a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/"><span style="color: #000099;"><span style="text-decoration: underline;">http://www.apache.org/dyn/closer.cgi/lucene/solr/</span></span></a></span></span></span></p>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Extract the compressed zip or tgz file to where you would like Solr to live.</span></span></span></p>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Currently, I like using runit to run Linux services. <a href="http://smarden.org/runit/"><span style="color: #000099;"><span style="text-decoration: underline;">http://smarden.org/runit/</span></span></a></span></span></span></p>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Install runit with: <strong>sudo apt-get install runit</strong></span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;"><br />
</span></span></span></p>
<p style="text-align: center;" lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-1.png"><img class="aligncenter size-full wp-image-3825" title="Screenshot-1" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-1.png" alt="" width="465" height="272" /></a></p>
<p style="text-align: center;" lang="en-US">&#160;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Create a new service directory.</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-2.png"><img class="size-full wp-image-3826 alignleft" title="Screenshot-2" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-2.png" alt="" width="317" height="72" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&#160;</p>
<p lang="en-US">&#160;</p>
<p lang="en-US">&#160;</p>
<p lang="en-US"><span style="font-family: Helvetica, sans-serif; font-size: small;">Create a new shell </span>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<h1 lang="en-US"><span style="font-family: Helvetica, sans-serif; font-weight: normal; font-size: small;">Let’s install Solr as a service on Linux. I’m using Ubuntu 11.04.</span></h1>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">First download the latest version of Solr from (3.3 as of this writing): <a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/"><span style="color: #000099;"><span style="text-decoration: underline;">http://www.apache.org/dyn/closer.cgi/lucene/solr/</span></span></a></span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Extract the compressed zip or tgz file to where you would like Solr to live.</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Currently, I like using runit to run Linux services. <a href="http://smarden.org/runit/"><span style="color: #000099;"><span style="text-decoration: underline;">http://smarden.org/runit/</span></span></a></span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Install runit with: <strong>sudo apt-get install runit</strong></span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;"><br />
</span></span></span></p>
<p style="text-align: center;" lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-1.png"><img class="aligncenter size-full wp-image-3825" title="Screenshot-1" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-1.png" alt="" width="465" height="272" /></a></p>
<p style="text-align: center;" lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Create a new service directory.</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-2.png"><img class="size-full wp-image-3826 alignleft" title="Screenshot-2" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-2.png" alt="" width="317" height="72" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="font-family: Helvetica, sans-serif; font-size: small;">Create a new shell script called run in the new /etc/sv/solr directory. You will need to have root permission to work in these directories, so use sudo. In this case, I want to run Solr as the user ‘mark’.</span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-8.png"><img class="size-full wp-image-3832 alignleft" title="Screenshot-8" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-8.png" alt="" width="326" height="75" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="font-family: Helvetica, sans-serif; font-size: small;"><br />
Make the run script executable.</span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-4.png"><img class="size-full wp-image-3828 alignleft" title="Screenshot-4" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-4.png" alt="" width="372" height="23" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Let runit know about the new service.</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-5.png"><img class="size-full wp-image-3829 alignleft" title="Screenshot-5" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot-5.png" alt="" width="480" height="14" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Now Solr should be up and running. If it dies or you kill it, it will automatically be restarted. If the server is restarted, Solr will be launched on startup.</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">To stop the service: <strong>sudo sv stop solr</strong></span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">To start the service: <strong>sudo sv start solr</strong></span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Great.</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<h2 lang="en-US">Logging</h2>
<p>&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">By default, Solr logs to STD ERROR. You likely want to add a log configuration file to have the most control over how Solr logs &#8211; see http://wiki.apache.org/solr/LoggingInDefaultJettySetup. To be lazy though (and perhaps safe), let’s make sure STD OUT and STD ERR are nicely logged for us by runit.</span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">This method just logs STD OUT, so lets first edit our Solr run script to redirect STD ERR to STD OUT</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot10.png"><img class="size-full wp-image-3834 alignleft" title="Screenshot10" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot10.png" alt="" width="404" height="75" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">Now create a new directory called log in the /etc/sv/solr service directory. Inside this, create another script called run. This script will start the log service, run it under the user mark, and put the log files in the log directory we just made (we use . for the current working directory).</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot11.png"><img class="size-full wp-image-3835 alignleft" title="Screenshot11" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot11.png" alt="" width="253" height="36" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">As we are running as mark, change the owner of the log dir to mark so that the log files can be created: <strong>sudo chown mark log</strong></span></span></span></p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;"><strong> </strong>Now make the new run script executable.</span></span></span></p>
<p lang="en-US"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot12.png"><img class="size-full wp-image-3836 alignleft" title="Screenshot12" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/08/Screenshot12.png" alt="" width="370" height="21" /></a></p>
<p><span style="color: #000000;"> </span></p>
<p lang="en-US">&nbsp;</p>
<p lang="en-US"><span style="color: #000000;"><span style="font-family: Helvetica, sans-serif;"><span style="font-size: small;">The next time runit starts, Solr logs will be logged to the /etc/sv/solr/log/current file and auto rolled for you.</span></span></span></p>
<p lang="en-US">&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/08/10/running-solr-as-a-service-on-linux/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Garbage Collection Bootcamp 1.0</title>
		<link>http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/</link>
		<comments>http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/#comments</comments>
		<pubDate>Sun, 27 Mar 2011 18:01:32 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3115</guid>
		<description><![CDATA[<h3>Table Of Contents</h3>
<ul>
<li><a href="#whatisgc">What is Garbage Collection</a></li>
<li><a href="#whatisgc"></a><a href="#tuninggc">Tuning Garbage Collection</a></li>
<li><a href="#tuninggc"></a><a href="#thecollectors">The Garbage Collectors</a></li>
<li><a href="#thecollectors"></a><a href="#choosingacollector">Choosing a Collector</a></li>
</ul>
<p><a name="whatisgc"></a></p>
<h2 style="padding-top: 10px;">What is Garbage Collection</h2>
<p style="text-align: left;">Garbage collection in Java is the processes of freeing the dynamic memory used by <a href="http://en.wikipedia.org/wiki/Object_(computer_science)">objects</a> that are no longer being used by an application. In languages such as or C or C++, the developer is often responsible for managing dynamic memory (using <a href="http://en.wikipedia.org/wiki/Malloc">malloc</a> and free or <a href="http://en.wikipedia.org/wiki/New_(C%2B%2B)">new</a> and <a href="http://en.wikipedia.org/wiki/Delete_(C%2B%2B)">delete</a>). However, in Java, this task is left &#8230;</p>]]></description>
			<content:encoded><![CDATA[<h3>Table Of Contents</h3>
<ul>
<li><a href="#whatisgc">What is Garbage Collection</a></li>
<li><a href="#whatisgc"></a><a href="#tuninggc">Tuning Garbage Collection</a></li>
<li><a href="#tuninggc"></a><a href="#thecollectors">The Garbage Collectors</a></li>
<li><a href="#thecollectors"></a><a href="#choosingacollector">Choosing a Collector</a></li>
</ul>
<p><a name="whatisgc"></a></p>
<h2 style="padding-top: 10px;">What is Garbage Collection</h2>
<p style="text-align: left;">Garbage collection in Java is the processes of freeing the dynamic memory used by <a href="http://en.wikipedia.org/wiki/Object_(computer_science)">objects</a> that are no longer being used by an application. In languages such as or C or C++, the developer is often responsible for managing dynamic memory (using <a href="http://en.wikipedia.org/wiki/Malloc">malloc</a> and free or <a href="http://en.wikipedia.org/wiki/New_(C%2B%2B)">new</a> and <a href="http://en.wikipedia.org/wiki/Delete_(C%2B%2B)">delete</a>). However, in Java, this task is left up to something known as the garbage collector. A garbage collector automatically frees unused memory, freeing the developer from much of this thankless memory juggling.</p>
<p style="text-align: left;">The most basic garbage collection algorithm works by starting at the root objects (ie objects on the thread stack, static objects, etc) that are live (live meaning currently in use) &#8211; and then iterating down over every reachable object. Any object that cannot be reached in this manner is garbage and can be collected. The application is paused while this process goes on. This is referred to as mark and sweep – first you mark the objects that are live, then you sweep those that are not. The time needed to do this is obviously proportional to the number of live objects (which can be quite a large number in modern Java applications), and so more efficient collection schemes have been devised.</p>
<p style="text-align: center;"><img class="size-full wp-image-1097   aligncenter" title="Heap Spaces" src="http://www.lucidimagination.com/blog/wp-content/uploads/2009/09/gc-spaces1.png" alt="gc-spaces" width="281" height="256" /></p>
<p style="text-align: left;">One such scheme comes from the natural fact that you can divide up objects based on how long they live. Most applications create a lot of very short lived objects, and fewer objects that are around for a long time (I&#8217;ve seen estimates that for the average application, 85-98% of allocated objects are short lived). You can take advantage of this fact when doing collections. In Java, objects are allocated from a region of memory known as the <a href="http://en.wikipedia.org/wiki/Dynamic_memory_allocation">heap</a>. The Java heap is generally divided up into a few spaces (its usually the same across implementations, but there is the odd exception or two). The major spaces are the young generation, the tenured generation (also called the old generation), and the permanent generation. The young generation is then further sub divided into the eden space and two survivor spaces. The permanent generation is generally for objects that are around for the life of the application (interned Strings, class objects, etc) and doesn&#8217;t usually play much of a role in garbage collection. The permanent generation size is not part of the heap region defined with -Xms and -Xmx. Though a very unusual need, it is still worth noting that the permanent generation can actually be collected if needed using:<span style="color: #0000ff;"> </span></p>
<pre>-XX:+CMSPermGenSweepingEnabled</pre>
<p style="text-align: left;"><span id="more-3115"></span>When objects are first created, they are allocated within the eden space. When the eden space becomes full, the still live objects within it are copied into one of the survivor spaces (or if they don&#8217;t fit, into the tenured space). One survivor space is always left empty, and on each young generation collection (a minor collection), the live objects from the eden space and the non empty survivor space are copied into the empty survivor space.  This leaves a newly emptied survivor space for the next round, as any still live objects in the formerly full survivor space will be copied into the tenured space.</p>
<p style="text-align: left;">As you can see, rather then running over every object  for every collection now, you can collect the young generations more often, and the tenured generation (long lived objects), much less often. You can also optimize your collection for the characteristics of the space – ie usually, almost all of the objects  in the young space will be garbage. In general, an object will have to survive a couple minor collections to make it to the tenured space (first making it into a survivor space and then the tenured space). A copying collector identifies garbage by copying live objects from one space to another &#8211; anything left over is by definition garbage. The Sun JDK uses copying collectors for the young space and mark and sweep type collectors for the tenured space.</p>
<p style="text-align: left;">
<p><a name="tuninggc"></a></p>
<h2 style="padding-top: 10px;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301246465_length-measure.png"><img class="alignleft size-full wp-image-3204" title="1301246465_length-measure" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301246465_length-measure.png" alt="" width="24" height="24" /></a>Tuning Garbage Collection</h2>
<p style="text-align: left;">Tuning for garbage collection means adjusting the sizes of the various spaces mentioned in the previous section, as well as the algorithms used to collect them. You can do this with various JVM command line options.</p>
<p style="text-align: left;">The amount of RAM available for the various spaces is dependent upon the size of the heap that the JVM has allocated. Defaults are chosen based on the hardware detected, but you can usually do better by specifying a good Xms, Xmx yourself. On a server machine, it can be a good idea to pin those two settings together so that the JVM does not waste any time resizing itself. You generally do not want to size the heap much larger than is needed &#8211; this can needlessly increase the cost of full garbage collections, and take RAM from other important activities, such as file system caching.</p>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>
<pre>-Xms</pre>
</td>
<td>Initial Heap Size</td>
</tr>
<tr>
<td>
<pre>-Xmx</pre>
</td>
<td>Maximum Heap Size</td>
</tr>
</tbody>
</table>
<p><strong>A Note About JVM Cmd Line Options</strong></p>
<ul>
<li>Boolean options &#8211;   <strong>On</strong>: <code>-XX:+&lt;option&gt;</code> <strong>Off</strong>: <code>-XX:-&lt;option&gt;</code>.</li>
<li>Numeric options:  <code>-XX:&lt;option&gt;=&lt;number&gt;</code>. Numbers can include &#8216;m&#8217; or &#8216;M&#8217; for megabytes, &#8216;k&#8217; or &#8216;K&#8217; for kilobytes, and &#8216;g&#8217; or &#8216;G&#8217; for gigabytes (1M= 1048576). In the case of Xms and Xmx, only one X is used and no colon.</li>
<li>String options: <code>-XX:&lt;option&gt;=&lt;string&gt; </code></li>
</ul>
<h4 style="padding-top: 16px;"><strong><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245152_layer-resize.png"><img class="alignleft size-full wp-image-3189" title="1301245152_layer-resize" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245152_layer-resize.png" alt="" width="16" height="16" /></a> Sizing the individual spaces</strong></h4>
<p style="text-align: left;">You usually want to grant plenty of memory to the young generation – especially when you have multiple processors – as allocation can be parallelized and each thread will get its own private piece of the eden space to work with. You generally want the young generation to have less than half the space of the tenured generation though – especially when using the Serialized collector. About 33% is usually a good number to start from. The best size will vary from application to application depending on its distribution of young vs long lived objects. You don&#8217;t want the young space to be so small that many short lived objects are getting piled into the tenured space. You also usually don&#8217;t want it to be so large that the tenured space doesn&#8217;t have enough space available to it and/or young generation collections start taking too long to complete.</p>
<p style="text-align: left;">Other than sizing the total heap, sizing the new generation (another name for the young generation) can be the most important piece to good performance.</p>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>
<pre>-XX:NewSize</pre>
</td>
<td>(Since 5.0) Size of the young generation at JVM startup – this is calculated automatically if you specify NewRatio</td>
</tr>
<tr>
<td>
<pre>-XX:MaxNewSize</pre>
</td>
<td>(Since 1.4) The largest size the young generation can grow to (unlimited if not specified)</td>
</tr>
<tr>
<td>
<pre>-Xmn</pre>
</td>
<td>Sets the new generation to a fixed size &#8211; this is not usually recommended unless you are fixing the other memory sizes as well.</td>
</tr>
<tr>
<td>
<pre>-XX:NewRatio</pre>
</td>
<td>Sets the new generation size as a ratio to the tenured generation size.</td>
</tr>
<tr>
<td>
<pre>-XX:SurvivorRatio</pre>
</td>
<td>You can also control the sizing of the survivor spaces – in practice this is not usually very helpful though.</td>
</tr>
</tbody>
</table>
<p style="text-align: left;">The best sizing is usually chosen by playing with the parameters and then testing the performance of your application. Often, the JVM uses good defaults, or depending on the garbage collector in use, resizes the spaces on it&#8217;s own based on historical statistics.</p>
<p style="text-align: left;">There are a few helpful tools that give you insight into the garbage collection process.</p>
<h4 style="padding-top: 8px;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245064_gnome-eyes.png"><img class="alignleft size-full wp-image-3186" title="1301245064_gnome-eyes" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245064_gnome-eyes.png" alt="" width="24" height="24" /></a> Getting a View into Garbage Collection</h4>
<p style="text-align: left;">You can use the following command line options to generate information about the garbage collection process:</p>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td>
<pre>-verbose:gc</pre>
</td>
<td>Print info about heap and gc on each collection.</td>
</tr>
<tr>
<td>
<pre>-XX:+PrintGCDetails</pre>
</td>
<td>(Since 1.4) Print additional garbage collection info.</td>
</tr>
<tr>
<td>
<pre>-XX:+PrintGCTimeStamps</pre>
</td>
<td>(Since 1.4) Add timestamps to the garbage collection logs.</td>
</tr>
<tr>
<td>
<pre>-Xloggc:C:\whereever\gc.log</pre>
</td>
<td>Specify log file.</td>
</tr>
</tbody>
</table>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245296_package_development.png"><img class="alignleft size-full wp-image-3191" title="1301245296_package_development" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245296_package_development.png" alt="" width="16" height="16" /></a> There are various tools to then help you decipher these logs. One is <a href="http://www.tagtraum.com/gcviewer.html">GCViewer</a> &#8211; though it only knows how to read gc logs up to Java 5.0 (though it can partially read 6.0 files). Another nice option from IBM is <a href="http://www.alphaworks.ibm.com/tech/pmat">PMAT</a>, and it can read Java 6 gc logs.</p>
<p style="text-align: left;">There is also a very cool tool called <a href="http://java.sun.com/performance/jvmstat/visualgc.html">VisualGC</a> that you can use to visually watch how objects move between spaces in real time as your application is running. This is available as a standalone application, or as a plugin for both <a href="http://netbeans.org/">Netbeans</a> and <a href="http://visualvm.java.net/">VisualVM</a>.</p>
<p style="text-align: left;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/visualgc.jpg"><img class="aligncenter size-full wp-image-3172" title="visualgc" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/visualgc.jpg" alt="" width="200" height="137" /></a></p>
<p style="text-align: left;">
<p style="text-align: left;">
<p><a name="thecollectors"></a></p>
<h2 style="padding-top: 10px;">The Garbage Collectors</h2>
<p style="text-align: left;"><em>The following applies to the Sun Java implementation as well as OpenJDK.</em></p>
<p style="text-align: left;">There are three main garbage collection schemes that you should concern yourself with (much of this applies to Java 1.4, but in general, I am targeting Java 1.5 and up). These schemes are often called collectors themselves, but generally each involves two collectors &#8211; one for the old space and one for the new space. These collector schemes are often referred to by their old space collector names: <strong>the Serialized Collector</strong>, <strong>the Throughput Collector</strong>, and <strong>the Concurrent Low Pause Collector</strong>.</p>
<p style="text-align: left;">There is also an older incremental collector (unsupported and also called the train collector), and an incremental collection mode for the concurrent low pause collector (that I touch on and is generally used when only one or two CPU&#8217;s are available), but I&#8217;ll leave those for you to explore on your own if you are interested.</p>
<p style="text-align: left;">
<h4 style="padding-top: 8px;"><strong><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png"><img class="size-full wp-image-3181 alignleft" title="garbage" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png" alt="" width="25" height="25" /></a>The Serialized Collector</strong></h4>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td><span style="color: #0000ff;">Cmd Line Arg</span></td>
<td>
<pre>-XX:+UseSerialGC (Since 5.0)</pre>
</td>
</tr>
<tr>
<td><span style="color: #0000ff;">New Space Collector</span></td>
<td><strong>Serial</strong> &#8211; single threaded, stop the world, copying collector</td>
</tr>
<tr>
<td><span style="color: #0000ff;">Old Space Collector</span></td>
<td><strong>Serial Old</strong> &#8211; single threaded, stop the world, mark-sweep-compact collector</td>
</tr>
</tbody>
</table>
<p style="text-align: left;">With the serialized collector, a major collection is done when the tenured space is full. This is known as a “stop the world” collection, because all application threads will be paused while the collection occurs.</p>
<p style="text-align: left;">This collector is best used with small applications, applications run on a single CPU machine, or applications where pause times don&#8217;t matter. This collector is relatively efficient because it does not need to communicate between threads, but you have to be willing to accept its “stop the world” pauses. Minor collections will &#8220;stop the world&#8221; as well, but are generally fairly efficient and fast.</p>
<p style="text-align: left;">This collector is the only one that I have seen to respect <span style="color: #0000ff;">-XX:MaxHeapFreeRatio </span><span style="color: #0000ff;"><span style="color: #000000;">- though that still only happens if a full collection is triggered. If you where trying to keep your RAM usage to a minimum, and always return as much memory as possible to the operating system, using the serialized collector and an aggressive </span><span style="color: #0000ff;"><span style="color: #000000;">-XX:MaxHeapFreeRatio</span><span style="color: #000000;"> can be a good strategy. You might want to occasionally force a full collection with System.gc() when your application is idle.</span></span></span></p>
<p style="text-align: left;">
<h4 style="padding-top: 8px;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png"><img class="alignleft size-full wp-image-3181" title="garbage" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png" alt="" width="25" height="25" /></a>The Throughput Collector  (also known as the Parallel Collector)</h4>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td><span style="color: #0000ff;">Cmd Line Arg</span></td>
<td>
<pre>-XX:+UseParallelGC (Since 1.4.1)</pre>
</td>
</tr>
<tr>
<td><span style="color: #0000ff;">New Space Collector</span></td>
<td><strong>Parallel Scavenge</strong> &#8211; multi threaded, stop the world, copying collector</td>
</tr>
<tr>
<td><span style="color: #0000ff;">Old Space Collector</span></td>
<td><strong>Serial Old</strong> &#8211; single threaded, stop the world, mark-sweep-compact collector</td>
</tr>
</tbody>
</table>
<p style="text-align: left;">The throughput collector uses a parallel version of the young generation collector, while the tenured generation will still use the serial collector. So while a single thread will still perform collections on the tenured space, multiple threads will work together collecting the young space.</p>
<p style="text-align: left;">A feature called parallel compaction was added in Java 1.5 update 6 – this feature allows the throughput collector to also perform major collections in parallel. You can enable this with<span style="color: #0000ff;"> -XX:+UseParallelOldGC</span>. Using this should help a lot with scalability, as you sidestep the single collection thread bottleneck on very large heaps (multi gigabyte). I&#8217;ve read this can actually lower performance on smaller heaps due to lock contention.</p>
<p style="text-align: left;">The throughput collector should be the default collector chosen on <a href="http://www.oracle.com/technetwork/java/ergo5-140223.html">server class machines</a> (in Java 1.5 and up), but there are exceptions &#8211; for example, my MacbookPro defaults to the CMS collector. You can always override these defaults.</p>
<p style="text-align: left;">Throughput is usually most useful when your application has a large number of threads creating  new objects, and you have more than one processor available (though more than two is best). Typically, when you have multiple threads allocating objects,  you also want to increase the size of the young generation.  The number of garbage collector threads will generally be equal to the number of processors you have, but you can control that number with <span style="color: #0000ff;">-XX:ParallelGCThreads</span>=n. Sometimes you will want to lower the number of threads because each will reserve a part of the tenured generation for promotions – this can cause a fragmentation effect and effectively lower the size of the tenured generation (this is generally only an issue if your application has access to many processors or cores).</p>
<p style="text-align: left;">The throughput collector also supports something called Ergonomics. As part of this support, you can specify various desired behaviors for your application, and the JVM will attempt to tune various settings to meet your goals.</p>
<p style="text-align: left;"><span style="color: #0000ff;">-XX:MaxGCPauseMillis</span>=n  hint to the throughput collector that a max pause time of n milliseconds is desired. By default there is no hint. The collector will adjust the heap size and other collection parameters in an attempt to meet the hint – keep in mind that throughput may be sacrificed in the attempt to meet this goal. There is also no guarantee that the goal will be met.</p>
<p style="text-align: left;">You can also specify a target goal for how much time is spent in garbage collection in comparison to running your application using <span style="color: #0000ff;">-XX:GCTimeRatio</span>. By default this is set to 1% (keep in mind that these defaults tend to change from release to release).</p>
<p style="text-align: left;">With the serialzed garbage collector a generation is collected when it is full (i.e., when no further allocations can be done from that generation). This is also true of the throughput collector.</p>
<p style="text-align: left;">
<h4 style="padding-top: 8px;"><strong><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png"><img class="alignleft size-full wp-image-3181" title="garbage" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/garbage.png" alt="" width="25" height="25" /></a>The Concurrent Low Pause Collector</strong></h4>
<table border="0" cellspacing="0" cellpadding="5">
<tbody>
<tr>
<td><span style="color: #0000ff;">Cmd Line Arg</span></td>
<td>
<pre>-XX:+UseConcMarkSweepGC (Since 1.4.1)</pre>
</td>
</tr>
<tr>
<td><span style="color: #0000ff;">New Space Collector</span></td>
<td><strong>Par New</strong> &#8211; multi threaded, stop the world, copying collector that works with CMS</td>
</tr>
<tr>
<td><span style="color: #0000ff;">Old Space Collector</span></td>
<td>Usually <strong>CMS</strong>, the mostly concurrent low pause collector &#8211; unless there is a concurrent mode failure, in which case, <strong>Serial Old </strong>- single threaded, stop the world, mark-sweep-compact collector</td>
</tr>
</tbody>
</table>
<p style="text-align: left;">Use the concurrent low pause collector when you can afford to share the processor resources with the garbage collector while the application is running. This is usually good for an application with a lot of long lived data – meaning you need a large tenured generation space. Obviously, having multiple processors is also helpful. This collector still pauses the application threads twice in a collection – once briefly at the start (when it marks objects directly accessible from root objects), and a slightly longer pause towards the middle (when it sweeps to find what it missed due to parallel marking) – the rest of the collection is done concurrently using one of the available processors (or one thread). If this collector cannot complete collecting the tenured space before it is full, all threads will be paused and a full collection performed – this is known as a concurrent mode failure and likely means you need to adjust the concurrent collection parameters.</p>
<p style="text-align: left;">This collector is used for the tenured generation, and does the collection concurrently with the execution of the application. This collector can also be paired with a parallel version of the young generation collector (<span style="color: #0000ff;">-XX:+UseParNewGC</span>).</p>
<p style="text-align: left;">Note that -<span style="color: #0000ff;">XX:+UseParallelGC</span> (the throughput collector) should not be used with <span style="color: #0000ff;">-XX:+UseConcMarkSweepGC</span>, and the JVM will fail on startup if you try this with most modern JVMs. Same with <span style="color: #0000ff;">-XX:+UseParallelOldGC</span>.</p>
<p style="text-align: left;">The concurrent low pause collector will keep statistics so that it can best guess when to start collecting (so that it finishes before the tenured space is full) – also though, it will start collecting when the tenured space hits a percentage of what&#8217;s available – You can manually set this with <span style="color: #0000ff;">-XX:CMSInitiatingOccupancyFraction</span>=n. The default for this setting varies across JVMs. I&#8217;ve read that the default for 1.5 was 68%, while the default for 1.6 is 92%. You can lower this if needed to ensure that the collection is kicked off sooner, and then you will be more likely to finish the collection before the tenured space is full.</p>
<p style="text-align: left;">The concurrent low pause collector can also be used in an incremental mode that I will not go into here. This mode causes the low pause collector to occasional yield the processor used for parallel collection back to the application, and thereby lessen its impact on application performance.</p>
<p style="text-align: left;">
<h5><strong>The Parallel Young Generation Collector</strong></h5>
<p style="text-align: left;"><span style="color: #0000ff;">-XX:+UseParNewGC</span></p>
<p style="text-align: left;">This collector is much like the throughput collector in that it collects the young generation in parallel. Most of what applies to the throughput collector also applies to this collector, however a different implementation is used that allows this collector to work in conjunction with the concurrent low pause collector, unlike the throughput collector. Despite some Sun/Oracle literature indicating this is off by default, it does seem to be on by default when using CMS in at least Java 6. You can disable it with:</p>
<pre>-XX:+UseConcMarkSweepGC -XX:-UseParNewGC</pre>
<p style="text-align: left;">The flip side of that coin is that while the throughput garbage collector (<span style="color: #0000ff;">-XX:+UseParallelGC</span>) can be used with adaptive sizing (<span style="color: #0000ff;">-XX:+UseAdaptiveSizePolicy</span>), the parallel young generation collector (<span style="color: #0000ff;">-XX:+UseParNewGC</span>) cannot.</p>
<p style="text-align: left;"><span style="color: #0000ff;">-XX:+UseAdaptiveSizePolicy</span> records statistics about GC times, allocation rates, and free space, and then sizes the young and tenured generations to best fit those statistics. This is for use with the throughput collector and is on by default.</p>
<p style="text-align: left;">
<p><a name="choosingacollector"></a></p>
<h2 style="padding-top: 10px;"><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245231_gnome-color-chooser.png"><img class="alignleft size-full wp-image-3190" title="1301245231_gnome-color-chooser" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/03/1301245231_gnome-color-chooser.png" alt="" width="16" height="16" /></a> Choosing a Collector</h2>
<p><em>Note: this article is biased towards server applications and the -server hotspot vm.</em></p>
<p>Usually you just want to start with the Parallel (throughput) collector. It&#8217;s the one that has ergonomics, and it will automatically adjust key settings so that most server apps will do just fine. This is the default collector on most server class systems. In general, you do <strong>not</strong> need to change any garbage collection settings until you have determined you have a garbage collection issue to solve.</p>
<p>When you have to confront very large heaps, the Parallel collector can start to break down &#8211; it collects the tenured space using a stop the world collection, meaning your app is frozen while the collections happens. So when you find that the Parallel collector is just not cutting it, even when using <span style="color: #0000ff;">UseParallelOldGC</span>, you might try the mostly Concurrent Low-Pause Collector. It will collect as your application is running using a thread on the side, with two much shorter stop-the-world pauses. Overall, the CMS collector is slower in terms of throughput &#8211; but your application will likely be frozen less often.</p>
<p>Ergonomics do not apply here, so you are on your own for coming up with good settings if the defaults don&#8217;t turn out to be a good fit &#8211; but you can often remove long &#8220;the world is stopped&#8221; pauses with this collector.</p>
<p>The hope is that it is just going to make sense to always use the G1 collector in the future &#8211; it attempts to offer the best of both worlds of the throughput and mostly concurrent low pause collectors.</p>
<h2 style="padding-top: 10px;">The Garbage First (G1) Collector</h2>
<p>The <a href="http://www.google.com/url?sa=t&amp;source=web&amp;cd=3&amp;ved=0CCYQFjAC&amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.63.6386%26rep%3Drep1%26type%3Dpdf&amp;rct=j&amp;q=garbage%20first%20white%20paper&amp;ei=2GaPTZOhNMnB0QHu3-GwCw&amp;usg=AFQjCNFumDknXeOYW1e9yzUpsNCxN3H3oQ&amp;sig2=QVrnASDWxo63FZRHh7x5hg">Garbage First Collector</a> is a new garbage collector that intends to rule them all. It is available in Sun Java 6 update 14 as well as recent versions of OpenJDK6 and early versions of OpenJDK 7. Eventually I plan to write more about his collector. Briefly: the G1 collector should combine the best of both the throughput and mostly concurrent low pause collectors. It uses new strategies to minimize stop the world pauses and maintain high throughput on multiprocessor systems with very large heaps.</p>
<p>Try this collector with:</p>
<pre>-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>[ANNOUNCE] Solr 1.4.1 Released</title>
		<link>http://www.lucidimagination.com/blog/2010/06/28/announce-solr-1-4-1-released/</link>
		<comments>http://www.lucidimagination.com/blog/2010/06/28/announce-solr-1-4-1-released/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 01:23:11 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2193</guid>
		<description><![CDATA[<p>Apache Solr 1.4.1 has been released and is now available for public<br />
download!<br />
<a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/"> http://www.apache.org/dyn/closer.cgi/lucene/solr/</a></p>
<p>Solr is the popular, blazing fast open source enterprise search<br />
platform from the Apache Lucene project.  Its major features include<br />
powerful full-text search, hit highlighting, faceted search, dynamic<br />
clustering, database integration, and rich document (e.g., Word, PDF)<br />
handling.  Solr is highly scalable, providing distributed search and<br />
index replication, and it powers the search and navigation features of<br />
many of the world&#8217;s &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Apache Solr 1.4.1 has been released and is now available for public<br />
download!<br />
<a href="http://www.apache.org/dyn/closer.cgi/lucene/solr/"> http://www.apache.org/dyn/closer.cgi/lucene/solr/</a></p>
<p>Solr is the popular, blazing fast open source enterprise search<br />
platform from the Apache Lucene project.  Its major features include<br />
powerful full-text search, hit highlighting, faceted search, dynamic<br />
clustering, database integration, and rich document (e.g., Word, PDF)<br />
handling.  Solr is highly scalable, providing distributed search and<br />
index replication, and it powers the search and navigation features of<br />
many of the world&#8217;s largest internet sites.</p>
<p>Solr is written in Java and runs as a standalone full-text search server<br />
within a servlet container such as Tomcat.  Solr uses the Lucene Java<br />
search library at its core for full-text indexing and search, and has<br />
REST-like HTTP/XML and JSON APIs that make it easy to use from virtually<br />
any programming language.  Solr&#8217;s powerful external configuration allows<br />
it to be tailored to almost any type of application without Java coding,<br />
and it has an extensive plugin architecture when more advanced<br />
customization is required.</p>
<p>Solr 1.4.1 is a bug fix release for Solr 1.4 that includes many Solr bug<br />
fixes as well as Lucene bug fixes from Lucene 2.9.3.</p>
<p>See all of the CHANGES here:<br />
<a href="http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.1/CHANGES.txt"> http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.1/CHANGES.txt</a></p>
<p>- &#8211; Mark Miller on behalf of the Solr team</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/06/28/announce-solr-1-4-1-released/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>[ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3</title>
		<link>http://www.lucidimagination.com/blog/2010/06/18/announce-release-of-lucene-java-3-0-2-and-2-9-3/</link>
		<comments>http://www.lucidimagination.com/blog/2010/06/18/announce-release-of-lucene-java-3-0-2-and-2-9-3/#comments</comments>
		<pubDate>Fri, 18 Jun 2010 17:01:29 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2188</guid>
		<description><![CDATA[<p>Hello Lucene users,</p>
<p>On behalf of the Lucene development community I would like to announce the<br />
release of Lucene Java versions 3.0.2 and 2.9.3:</p>
<p>Both releases fix bugs in the previous versions:</p>
<p>- 2.9.3 is a bugfix release for the Lucene Java 2.x series, based on Java<br />
1.4.<br />
- 3.0.2 has the same bug fix level but is for the Lucene Java 3.x series,<br />
based on Java 5.</p>
<p>New users of Lucene are advised to &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Hello Lucene users,</p>
<p>On behalf of the Lucene development community I would like to announce the<br />
release of Lucene Java versions 3.0.2 and 2.9.3:</p>
<p>Both releases fix bugs in the previous versions:</p>
<p>- 2.9.3 is a bugfix release for the Lucene Java 2.x series, based on Java<br />
1.4.<br />
- 3.0.2 has the same bug fix level but is for the Lucene Java 3.x series,<br />
based on Java 5.</p>
<p>New users of Lucene are advised to use version 3.0.2 for new developments,<br />
because it has a clean, type-safe API.</p>
<p>Important improvements in these releases include:<br />
- Fixed memory leaks in IndexWriter when large documents are indexed. It<br />
also uses now shared memory pools for term vectors and stored fields.<br />
IndexWriter now releases Fieldables and Readers on close.<br />
- NativeFSLockFactory fixes and improvements. Release write lock if<br />
exception occurs in IndexWriter ctors.<br />
- FieldCacheImpl.getStringIndex() no longer throws an exception when term<br />
count exceeds doc count.<br />
- Improve concurrency of IndexReader, especially in the context of near<br />
real-time readers.<br />
- Near real-time readers, opened while addIndexes* is running, no longer<br />
miss some segments.<br />
- Performance improvements in ParallelMultiSearcher (3.0.2 only).<br />
- IndexSearcher no longer throws NegativeArraySizeException if you pass<br />
Integer.MAX_VALUE as nDocs to search methods.</p>
<p>Both releases are fully compatible with the corresponding previous versions.<br />
We strongly recommend upgrading to 2.9.3 if you are using 2.9.x; and to<br />
3.0.2 if you are using 3.0.x.</p>
<p>See core changes at<br />
<a href="http://lucene.apache.org/java/3_0_2/changes/Changes.html"> http://lucene.apache.org/java/3_0_2/changes/Changes.html</a><br />
<a href="http://lucene.apache.org/java/2_9_3/changes/Changes.html"> http://lucene.apache.org/java/2_9_3/changes/Changes.html</a></p>
<p>and contrib changes at<br />
<a href="http://lucene.apache.org/java/3_0_2/changes/Contrib-Changes.html"> http://lucene.apache.org/java/3_0_2/changes/Contrib-Changes.html</a><br />
<a href="http://lucene.apache.org/java/2_9_3/changes/Contrib-Changes.html"> http://lucene.apache.org/java/2_9_3/changes/Contrib-Changes.html</a></p>
<p>Binary and source distributions are available at<br />
<a href="http://www.apache.org/dyn/closer.cgi/lucene/java/"> http://www.apache.org/dyn/closer.cgi/lucene/java/</a></p>
<p>Lucene artifacts are also available in the Maven2 repository at<br />
<a href="http://repo1.maven.org/maven2/org/apache/lucene/"> http://repo1.maven.org/maven2/org/apache/lucene/</a></p>
<p>&#8212;&#8211;<br />
Uwe Schindler<br />
uschindler@apache.org<br />
Apache Lucene PMC Member / Committer<br />
Bremen, Germany</p>
<p>http://lucene.apache.org/</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/06/18/announce-release-of-lucene-java-3-0-2-and-2-9-3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Lucene and Solr Development Have Merged</title>
		<link>http://www.lucidimagination.com/blog/2010/03/26/lucene-and-solr-development-have-merged/</link>
		<comments>http://www.lucidimagination.com/blog/2010/03/26/lucene-and-solr-development-have-merged/#comments</comments>
		<pubDate>Fri, 26 Mar 2010 23:06:52 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1892</guid>
		<description><![CDATA[<p>The Lucene community has recently decided to merge the development of two of its sub-projects – Lucene-&#62;Java and Lucene-&#62;Solr. Both code bases now sit under the same trunk in svn and Solr actually runs straight off the latest Lucene code at all times. This is just a merge of development though. Release artifacts will remain separate: Lucene will remain a core search engine Java library and Solr will remain a search server built on top &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>The Lucene community has recently decided to merge the development of two of its sub-projects – Lucene-&gt;Java and Lucene-&gt;Solr. Both code bases now sit under the same trunk in svn and Solr actually runs straight off the latest Lucene code at all times. This is just a merge of development though. Release artifacts will remain separate: Lucene will remain a core search engine Java library and Solr will remain a search server built on top of Lucene. From a user perspective, things will be much the same as they were – just better.</p>
<p>So what is with the merge?</p>
<p>Because of the way things worked in the past, even with many overlapping committers, many features that could benefit Lucene have been placed in Solr. They arguably “belonged” in Lucene, but due to dev issues, it benefited Solr to keep certain features that were contributed by Solr devs under Solr&#8217;s control. Moving some of this code to Lucene would mean that some Solr committers would no longer have access to it &#8211; A Solr committer that wrote and committed the code might actually lose the ability to maintain it without the assistance of a Lucene committer – and if Solr wanted to be sure to run off a stable, released version of Lucene, Solr&#8217;s release could be tied to Lucene&#8217;s latest release when some of this code needed to be updated. With Solr planning to update Lucene libs less frequently (due to the complexities of releasing with a development version of Lucene), there would be long waits for bug fixes to be available in Solr trunk.</p>
<p>All and all, there would be both pluses and minuses to refactoring Solr code into Lucene without the merge, but the majority have felt the minuses outweighed the pluses. Attempts at doing this type of thing in the past have failed and resulted in diverging similar code in both code bases. With many committers overlapping both projects, this was a very odd situation. Fix a bug in one place, and then go and look for the same bug in similar, but different code in another place &#8211; perhaps only being able to commit in one of the two spots.</p>
<p>With merged dev, there is now a single set of committers across both projects. Everyone in both communities can now drive releases – so when Solr releases, Lucene will also release – easing concerns about releasing Solr on a development version of Lucene. So now, Solr will always be on the latest trunk version of Lucene and code can be easily shared between projects  – Lucene will likely benefit from Analyzers and QueryParsers that were only available to Solr users in the past. Lucene will also benefit from greater test coverage, as now you can make a single change in Lucene and run tests for both projects – getting immediate feedback on the change by testing an application that extensively uses the Lucene libraries. Both projects will also gain from a wider development community, as this change will foster more cross pollination between Lucene and Solr devs (now just Lucene/Solr devs).</p>
<p>All and all, I think this merge is going to be a big boon for both projects. A tremendous amount of work has already been done to get Solr working with the latest Lucene API&#8217;s and allow for a seamless development experience with Lucene/Solr as a single code base (the Lucene/Solr tests are ridiculously faster than they were as well!). Look for some really fantastic releases from Lucene/Solr in the future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/03/26/lucene-and-solr-development-have-merged/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Lucene 2.9.2 and 3.0.1</title>
		<link>http://www.lucidimagination.com/blog/2010/02/17/lucene-2-9-2-and-3-0-1/</link>
		<comments>http://www.lucidimagination.com/blog/2010/02/17/lucene-2-9-2-and-3-0-1/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 14:02:39 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1791</guid>
		<description><![CDATA[<p>The vote is on for what I think is a Lucene first &#8211; two simultaneous bug fix releases. Because the Lucene 2 series is the last to support Java 1.4, we are doing a bug fix release for for 2.9 as well as the recently released Java 1.5 required 3.0 release.</p>
<p>A little preview from the proposed release announce:</p>
<blockquote><p>Important improvements in these releases are a increased maximum number of unique terms in each index </p>&#8230;</blockquote>]]></description>
			<content:encoded><![CDATA[<p>The vote is on for what I think is a Lucene first &#8211; two simultaneous bug fix releases. Because the Lucene 2 series is the last to support Java 1.4, we are doing a bug fix release for for 2.9 as well as the recently released Java 1.5 required 3.0 release.</p>
<p>A little preview from the proposed release announce:</p>
<blockquote><p>Important improvements in these releases are a increased maximum number of unique terms in each index segment. They also add fixes in IndexWriter’s commit and lost document deletes in near real-time indexing. Also lots of bugs in Contrib’s Analyzers package were fixed. Additionally, the 3.0.1 release restored some public methods, that get lost during deprecation removal. If you are using Lucene in a web application environment, you will notice that the new Attribute-based TokenStream API now works correct with different class loaders.<br />
Both releases are fully compatible with the corresponding previous versions. We strongly recommend upgrading to 2.9.2 if you are using 2.9.1 or 2.9.0; and to 3.0.1 if you are using 3.0.0.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/02/17/lucene-2-9-2-and-3-0-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lucene 2.9.1 Released</title>
		<link>http://www.lucidimagination.com/blog/2009/11/09/lucene-2-9-1-released/</link>
		<comments>http://www.lucidimagination.com/blog/2009/11/09/lucene-2-9-1-released/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 01:30:04 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1307</guid>
		<description><![CDATA[<p>We were all so caught up in the fun at ApacheCon that no one announced the Lucene 2.9.1 release. Its out, and its highly recommended if you are currently on 2.9.0. Check it out: <a href="http://lucene.apache.org/java/docs/#6+November+2009+-+Lucene+Java+2.9.1+available">http://lucene.apache.org/java/docs/#6+November+2009+-+Lucene+Java+2.9.1+available</a></p>
<p>To learn more about what&#8217;s new in the Lucene 2.9.1 release, check out these resources:</p>
<ul>
<li>White Paper: <a href="http://www.lucidimagination.com/developer/whitepaper/Whats-New-in-Apache-Lucene-2-9">&#8220;What&#8217;s New in Apache Lucene 2.9&#8243;</a></li>
<li>Recorded Webinar: <a href="http://www.lucidimagination.com/Solutions/Webinars/Apache-Lucene-29-Discover-Powerful-New-Features">&#8220;Apache Lucene 2.9: Discover the Powerful New Features&#8221;</a></li>
&#8230;</ul>]]></description>
			<content:encoded><![CDATA[<p>We were all so caught up in the fun at ApacheCon that no one announced the Lucene 2.9.1 release. Its out, and its highly recommended if you are currently on 2.9.0. Check it out: <a href="http://lucene.apache.org/java/docs/#6+November+2009+-+Lucene+Java+2.9.1+available">http://lucene.apache.org/java/docs/#6+November+2009+-+Lucene+Java+2.9.1+available</a></p>
<p>To learn more about what&#8217;s new in the Lucene 2.9.1 release, check out these resources:</p>
<ul>
<li>White Paper: <a href="http://www.lucidimagination.com/developer/whitepaper/Whats-New-in-Apache-Lucene-2-9">&#8220;What&#8217;s New in Apache Lucene 2.9&#8243;</a></li>
<li>Recorded Webinar: <a href="http://www.lucidimagination.com/Solutions/Webinars/Apache-Lucene-29-Discover-Powerful-New-Features">&#8220;Apache Lucene 2.9: Discover the Powerful New Features&#8221;</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/11/09/lucene-2-9-1-released/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Highlighting Highlighter Thoughts</title>
		<link>http://www.lucidimagination.com/blog/2009/02/17/highlighting-highlighter-thoughts/</link>
		<comments>http://www.lucidimagination.com/blog/2009/02/17/highlighting-highlighter-thoughts/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 16:29:24 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=117</guid>
		<description><![CDATA[<p style="margin-bottom: 0in;">I have some Highlighter work that I keep meaning to finish up (basic support for highlighting ConstantScoreQuerys), and so I have the Highlighter on my mind&#8230;</p>
<p><span id="more-117"></span></p>
<p style="margin-bottom: 0in;">
</p><p style="margin-bottom: 0in;"><strong>The History&#8230;</strong></p>
<p style="margin-bottom: 0in;">The first Lucene Highlighter was written and contributed to Lucene by Mark Harwood, a longtime  Lucene contrib Committer and PMC member.  Mark came up with a nice, robust, extensible API and a handful of default implementations for the API. It was a very solid Highlighter implementation that &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I have some Highlighter work that I keep meaning to finish up (basic support for highlighting ConstantScoreQuerys), and so I have the Highlighter on my mind&#8230;</p>
<p><span id="more-117"></span></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>The History&#8230;</strong></p>
<p style="margin-bottom: 0in;">The first Lucene Highlighter was written and contributed to Lucene by Mark Harwood, a longtime  Lucene contrib Committer and PMC member.  Mark came up with a nice, robust, extensible API and a handful of default implementations for the API. It was a very solid Highlighter implementation that has held up nicely in the face of a lot of complicated Analyzers and Filters. A variety of contributors have enriched the code over the years since then (squashing bugs and making improvements), and the Highlighter is currently fairly capable and heavily used.</p>
<p style="margin-bottom: 0in;">The Lucene contrib Highlighter was created with a focus on generating text fragments. This allows you to easily generate &#8216;keywords in context&#8217; type views (ie the results list from your favorite search engine). Eventually, the NullFragmenter was added, allowing you to highlight a full document as well (you could have used the API to write your own NullFragmenter before Lucene added it – one of the nice things about the Highlighter&#8217;s fairly pluggable API).</p>
<p style="margin-bottom: 0in;"><strong>Scoring and Highlighting&#8230;</strong></p>
<p style="margin-bottom: 0in;">The Highlighter works with a TokenStream and a Query. A TokenStream is just as it sounds: a stream of tokens – terms even, if thats easier – terms with possibly additional meta-data attached (position, offsets in original text, etc). An Analyzer and a document create a TokenStream – apply the Analyzer to the documents text, and out pops the Tokens. By comparing the tokens from the query with the tokens from the document, the Highlighter can identify which tokens should be highlighted (termFromDoc==termFromQuery? Highlight!). The highlighter works by feeding tokens from the document one at a time to a Scorer. The Scorer assigns a score to the token. The QueryScorer assigns the score based on whether the token matches a token in the query.  Fragments are then generated and scored based on the underlying token scores. Generally the token score might just be 0 or 1, but you can do gradient highlighting by expanding the range of the scores (if you pass an IndexReader to QueryScorer, it will use term index stats to modify the score based on those stats). Finally, a pluggable Formatter implementation will actually insert the highlight text (using the score to decide what, if any, text to insert).</p>
<p style="margin-bottom: 0in;"><strong>Obtaining a TokenStream for a Document&#8230;</strong></p>
<p style="margin-bottom: 0in;">Unfortunately, the index does not store the TokenStream for a document, so when its time to highlight, its up to the user to get a valid TokenStream for the document text. Generally this means shoving the original text for the document through the Analyzer you used for indexing the document. However, if you stored term vectors in your index, the position and/or offset information can be used to reconstruct the TokenStream from info in the index. Especially for large documents, this can be much faster. The TokenSources class in the Highlighter package will build a TokenStream for you, using the best method based on whether term vectors are available or not.</p>
<p style="margin-bottom: 0in;"><strong>SpanScorer &#8211; adding position sensitive highlighting&#8230;</strong></p>
<p style="margin-bottom: 0in;">A couple of years ago I became interested in adding positional support to the Highlighter. The QueryScorer implementation just checks that tokens from the query match tokens from the document, and it doesn&#8217;t take the position of the tokens into account. The result is that if you use a PhraseQuery, rather than just highlighting the phrase, each term in the phrase will be highlighted everywhere it occurs. Attempts had been made to support PhraseQuery highlighting in the past, but not in a way that took advantage of the current Highlighter framework, and not in a way that supported the other positional queries (SpanQuery, MultiPhraseQuery, etc). I wanted pretty much full Highlighting support, as well as all of the goodness that had been squeezed into the current Highlighter. The result of this desire was a new Scorer implementation called SpanScorer.</p>
<p style="margin-bottom: 0in;">The new SpanScorer would put the TokenStream into a fast single doc MemoryIndex, convert the query to a SpanQuery approximation, and call getSpans on the MemoryIndex to get all of the position hits for the document. This info is then used in scoring to filter out query terms that match doc terms, but are not in the correct position. The SpanScorer now supports almost the entire range of Lucene queries, and is just as fast as the QueryScorer for Query clauses that are not position sensitive.</p>
<p style="margin-bottom: 0in;">Lucene added the SpanScorer in release 2.4 and Solr has also added support for the SpanScorer in 1.3. To take advantage of the SpanScorer in Lucene, just use the SpanScorer rather than the QueryScorer. You can enable the SpanScorer in Solr by passing hl.usePhraseHighlighter=true with your request.</p>
<p style="margin-bottom: 0in;"><strong>Other Highlighter Implementations&#8230;</strong></p>
<p style="margin-bottom: 0in;">In Lucene JIRA there are a couple of other Highlighter implementations as well. The most interesting ideas you will find in them come from the two implementations that require term vectors to be stored for the documents you want to highlight. If you can enforce that requirement (something we don&#8217;t yet want to do with the default Highlighter), you can use the approach of looking at just the terms in the query, rather than looking at each of the terms in the document. This can be a very large win on large documents. The downsides are, that its not easy (and it hasn&#8217;t been done that I know of) to highlight based on position (phrase/span queries), and the exposed API for custom hooks is less rich. And of course, you have to store TermVectors to use the Highlighter.</p>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/02/17/highlighting-highlighter-thoughts/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Investigating OOM and other JVM issues</title>
		<link>http://www.lucidimagination.com/blog/2009/02/09/investigating-oom-and-other-jvm-issues/</link>
		<comments>http://www.lucidimagination.com/blog/2009/02/09/investigating-oom-and-other-jvm-issues/#comments</comments>
		<pubDate>Mon, 09 Feb 2009 15:29:40 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=66</guid>
		<description><![CDATA[<p>When working with a large scale Lucene/Solr installation, users sometimes run into memory issues or garbage collection performance problems. Its not a frequent occurrence in my experience, but just like life, it happens. Whenever I run into this sort of thing, I&#8217;ve come to rely on a variety of free tools to investigate the problem. There are many other free tools I have tried, but over time, I have found the following most useful:</p>
<p><strong>jmap</strong>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>When working with a large scale Lucene/Solr installation, users sometimes run into memory issues or garbage collection performance problems. Its not a frequent occurrence in my experience, but just like life, it happens. Whenever I run into this sort of thing, I&#8217;ve come to rely on a variety of free tools to investigate the problem. There are many other free tools I have tried, but over time, I have found the following most useful:</p>
<p><strong>jmap</strong> <a href="http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jmap.html"><em>http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jmap.html</em></a></p>
<p>This is a great little tool that lets you get live memory info from running java apps. jmap -histo is a very useful command. It shows a nice little histogram of the heap for a live process. Its often a lot easier to use this tool see whats taking up all that RAM than to hook up a profiler, especially since jmap comes with most JDKs these days.</p>
<p><strong>VisualGC</strong> <a href="http://java.sun.com/performance/jvmstat/visualgc.html">http://java.sun.com/performance/jvmstat/visualgc.html</a></p>
<p>Awesome tool that takes advantage of the jstat instrumentation thats part of all JVMs these days to show you a visual representation of a live applications memory footprint. This lets you track your heap and watch as the eden space fills up and overflows, etc. This is a very cool tool to play around with, and is very useful for getting a feel for how garbage collection in your application works.</p>
<p><strong>jstatd</strong> <a href="http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstatd.html">http://java.sun.com/j2se/1.5.0/docs/tooldocs/share/jstatd.html</a></p>
<p>You can use this to use most of these tools remotely. Somewhat legacy now, as I think the latest Java releases allow you to simply use a command line parameter when you start the java app for remote connections.</p>
<p><strong>jconsole</strong> <a href="http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html">http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html</a></p>
<p>jconsole is a powerful remote monitoring and management tool for the Java platform. If you havn&#8217;t used it yourself yet, you really owe it to yourself to spend some time with this tool. You can view a plethora of live stats and JVM information with this GUI tool. Java 1.5 and up comes with jconsole.</p>
<p><strong>Netbeans Profiler</strong> <a href="http://www.netbeans.org/">http://www.netbeans.org/</a></p>
<p>The profiler that is included with Netbeans is top notch and very easy to use. I use eclipse, but I also generally have Netbeans installed as well for some of its little goodies, like the profiler. Netbeans itself has also been getting better and better, but it has not yet pulled me from eclipse. Netbeans has the best and easiest to use free profiler I have used though (Netbean&#8217;s profiler is also part of the next tool).</p>
<p><strong>VisualVM</strong> <a href="https://visualvm.dev.java.net/">https://visualvm.dev.java.net/</a></p>
<p>This is a great open source VM tool that sort of combines jconsole type capabilities with the Netbeans profiler and plugin framework. Its a very cool wrapper around a lot of existing Java tools, with practically unlimited extendability. Very interesting, and plenty useful.</p>
<p>Some of the jstat tools come with Java 1.5 distributions and up (depending on the dist), but if you are using Java 1.4, check out http://java.sun.com/performance/jvmstat/.</p>
<p>These tools are great for troubleshooting both RAM and Garbage Collection issues. To add to your garbage collection toolbox though, also check out: <a href="http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html">http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html</a></p>
<p>Trying the different garbage collection options available and monitoring with some of the above tools can be a very effective attack on poor GC performance.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/02/09/investigating-oom-and-other-jvm-issues/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Looking forward to new features in Solr 1.4</title>
		<link>http://www.lucidimagination.com/blog/2009/02/05/looking-forward-to-new-features-in-solr-14/</link>
		<comments>http://www.lucidimagination.com/blog/2009/02/05/looking-forward-to-new-features-in-solr-14/#comments</comments>
		<pubDate>Fri, 06 Feb 2009 02:27:13 +0000</pubDate>
		<dc:creator>Mark Miller</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[Mark Miller]]></category>

		<guid isPermaLink="false">http://blog.lucidimagination.com/?p=56</guid>
		<description><![CDATA[Looking at some of the interesting changes coming to Solr 1.4]]></description>
			<content:encoded><![CDATA[<p>It looks like the Solr team might be gearing up for a release soon. That means Solr 1.4 is likely right around the corner, and that has me thinking about the new features we can look forward too. I figured I&#8217;d highlight some of the new stuff that I personally find especially interesting:</p>
<p><b>SOLR-560: Improved Logging</b></p>
<p>This lets you work with Solr with whatever logging implementation you are most familiar with. Very cool.</p>
<p><i>Use SLF4J logging API rather then JDK logging.  The packaged .war file is shipped with a JDK logging implementation, so logging configuration for the .war should be identical to solr 1.3.  However, if you are using the .jar file, you can select which logging implementation to use by dropping a different binding. See: http://www.slf4j.org/</i></p>
<p><b>SOLR-561: Pure Java Index Replication</b></p>
<p>A new replication implementation written in Java. This is a great addition to Solr and brings the simple replication that Unix users have taken for granted to Windows. Good stuff. Its not as battle tested as the old scripting/rsync solution, but its been used in production by a few people and has essentially gone through a strong beta period already. Anyone using replication for horizontal scaling should check this out.</p>
<p><i>Added Replication implemented in Java as a request handler. Supports index replication as well as configuration replication and exposes detailed statistics and progress information on the Admin page. Works on all platforms.</i></p>
<p><b>SOLR-284: Content Detection/Extraction with Tika</b></p>
<p>Apache Tika is a new Lucene sub project, and its looking very promising. Tika is a great content detection and extraction library that supports many popular formats:http://lucene.apache.org/tika/formats.html. This makes it a lot easier to pump most popular file types easily into Solr.</p>
<p><i>Added support for extracting content from binary documents like MS Word and PDF using Apache Tika.</i> </p>
<p><b>SOLR-911: Multi-Select Faceting Support</b></p>
<p>Multi-select faceting support. Awesome. I&#8217;ve been seeing it more it more every day. Solr&#8217;s facet support continues to be excellent. Check out our use of multi-select at at www.lucidimagination.com/search.</p>
<p><i>Add support for multi-select faceting by allowing filters to be tagged and facet commands to exclude certain filters.  This patch also added the ability to change the output key for facets in the response, and optimized distributed faceting refinement by lowering parsing overhead and by making requests and responses smaller.</i></p>
<p><b>SOLR-906: Buffered Updates With Solrj Over Http</b></p>
<p>More efficient index construction over http with solrj. If your doing it, this is a fantastic performance improvement.</p>
<p><i>Adding a StreamingUpdateSolrServer that writes update commands to an open HTTP connection.  If you are using solrj for bulk update requests you should consider switching to this implementation.  However, note that the error handling is not immediate as it is with the standard SolrServer.</i></p>
<p><b>SOLR-374: Index Reopen</b></p>
<p>Index reopening came to Lucene some time ago, and now comes to Solr. This means that when you add a couple documents to Solr, rather than opening the whole index again, only the one small segment is opened (subject to segment merging). A lot of work has gone on in Lucene development with reopen recently, and its going to be cool to see how Solr is able to take advantage of it all. Progress towards core real-time Lucene/Solr index/search is building.</p>
<p><i>Use IndexReader.reopen to save resources by re-using parts of the index that haven&#8217;t changed.</i></p>
<p><b>SOLR-475: Faceting Performance Boost</b></p>
<p>I havn&#8217;t used this first hand, but the reviews have been stellar. This should be a fantastic performance boost from what I hear.</p>
<p><i>New faceting method with better performance and smaller memory usage for multi-valued fields with many unique values but relatively few values per document. Controllable via the facet.method parameter &#8211; &#8220;fc&#8221; is the new default method and &#8220;enum&#8221; is the original method.  </i></p>
<p><b>SOLR-84: New Solr Logo</b></p>
<p>Check out the new Solr logo. This was the winner of a community contest, and I think we really got a nice logo out of it. There were plenty of options to choose from, so it was a really successful contest <img src="http://lucene.apache.org/solr/images/solr_FC.jpg" alt="Solr Logo" /></p>
<p><i>Use new Solr logo in admin </i></p>
<p><b>And&#8230;</b></p>
<p>And then of course, there are a handful of goodies in the latest Lucene libraries that will affect Solr or lead to new Solr features shortly &#8211; and tons of other features, bug fixes, and performance improvements.</p>
<p>- Mark</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/02/05/looking-forward-to-new-features-in-solr-14/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

