<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; apache</title>
	<atom:link href="http://www.lucidimagination.com/blog/category/apache/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Bet You Didn&#8217;t Know Lucene Can&#8230;</title>
		<link>http://www.lucidimagination.com/blog/2011/11/14/bet-you-didnt-know-lucene-can/</link>
		<comments>http://www.lucidimagination.com/blog/2011/11/14/bet-you-didnt-know-lucene-can/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 15:43:36 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[ApacheCon]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4418</guid>
		<description><![CDATA[<p>Here are my ApacheCon 2011 slides for my talk &#8220;Bet You Didn&#8217;t Know Lucene Can&#8230;&#8221; :</p>
<p>&#160;</p>
<div id="__ss_10155480" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Bet you didn't know Lucene can..." href="http://www.slideshare.net/gsingers/bet-you-didnt-know-lucene-can">Bet you didn&#8217;t know Lucene can&#8230;</a></strong>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/gsingers">gsingers</a>.</div>
&#8230;</div>]]></description>
			<content:encoded><![CDATA[<p>Here are my ApacheCon 2011 slides for my talk &#8220;Bet You Didn&#8217;t Know Lucene Can&#8230;&#8221; :</p>
<p>&nbsp;</p>
<div id="__ss_10155480" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Bet you didn't know Lucene can..." href="http://www.slideshare.net/gsingers/bet-you-didnt-know-lucene-can">Bet you didn&#8217;t know Lucene can&#8230;</a></strong><object id="__sse10155480" width="425" height="355" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=lucenecan-111114094003-phpapp01&amp;stripped_title=bet-you-didnt-know-lucene-can&amp;userName=gsingers" /><param name="allowscriptaccess" value="always" /><param name="allowfullscreen" value="true" /><embed id="__sse10155480" width="425" height="355" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=lucenecan-111114094003-phpapp01&amp;stripped_title=bet-you-didnt-know-lucene-can&amp;userName=gsingers" allowFullScreen="true" allowScriptAccess="always" allowscriptaccess="always" allowfullscreen="true" /></object></p>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/gsingers">gsingers</a>.</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/11/14/bet-you-didnt-know-lucene-can/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SF Bay Area Apache Mahout User Meeting on Nov. 29</title>
		<link>http://www.lucidimagination.com/blog/2011/11/05/sf-bay-area-apache-mahout-user-meeting-on-nov-29/</link>
		<comments>http://www.lucidimagination.com/blog/2011/11/05/sf-bay-area-apache-mahout-user-meeting-on-nov-29/#comments</comments>
		<pubDate>Sat, 05 Nov 2011 14:42:16 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Lucid Imagination]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[lucid imagination]]></category>
		<category><![CDATA[MapR]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4416</guid>
		<description><![CDATA[[ Tuesday, 29 November 2011; 18:30 to 21:30. ] <p>For all of those interested in Apache Mahout and scalable machine learning, Lucid Imagination is hosting a Mahout Users Meeting at it&#8217;s new office in Redwood City on Nov. 29th. Doors open at 6:30 pm. The night will feature two speakers, Ted Dunning of <a href="http://www.mapr.com">MapR Technologies</a> and Grant Ingersoll of <a href="http://www.lucidimagination.com">Lucid Imagination</a>, along with a social gathering with food and drinks.</p>
<p>For more details and &#8230;</p>]]></description>
			<content:encoded><![CDATA[[ Tuesday, 29 November 2011; 18:30 to 21:30. ] <p>For all of those interested in Apache Mahout and scalable machine learning, Lucid Imagination is hosting a Mahout Users Meeting at it&#8217;s new office in Redwood City on Nov. 29th. Doors open at 6:30 pm. The night will feature two speakers, Ted Dunning of <a href="http://www.mapr.com">MapR Technologies</a> and Grant Ingersoll of <a href="http://www.lucidimagination.com">Lucid Imagination</a>, along with a social gathering with food and drinks.</p>
<p>For more details and to RSVP, please see <a href="http://sf-mahout-11-11.eventbrite.com/">http://sf-mahout-11-11.eventbrite.com/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/11/05/sf-bay-area-apache-mahout-user-meeting-on-nov-29/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>From Barcelona to Vancouver with Lucene and Solr</title>
		<link>http://www.lucidimagination.com/blog/2011/10/22/barcelona-vancouver/</link>
		<comments>http://www.lucidimagination.com/blog/2011/10/22/barcelona-vancouver/#comments</comments>
		<pubDate>Sat, 22 Oct 2011 10:14:36 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[ApacheCon]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4364</guid>
		<description><![CDATA[<p>With another <a href="http://lucene-eurocon.com/">Lucene Eurocon</a> successfully behind us (thanks Barcelona, you&#8217;ve been awesome!), it&#8217;s time to say hello to Vancouver for <a href="http://na11.apachecon.com/">ApacheCon</a>.  I&#8217;ll leave it to others to fill in the blanks on the Barcelona conference other than to say that I am continually amazed by the vibrancy of the Lucene/Solr community and especially grateful to all the committers and contributors who take the time to show up and give talks about how they leverage &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>With another <a href="http://lucene-eurocon.com/">Lucene Eurocon</a> successfully behind us (thanks Barcelona, you&#8217;ve been awesome!), it&#8217;s time to say hello to Vancouver for <a href="http://na11.apachecon.com/">ApacheCon</a>.  I&#8217;ll leave it to others to fill in the blanks on the Barcelona conference other than to say that I am continually amazed by the vibrancy of the Lucene/Solr community and especially grateful to all the committers and contributors who take the time to show up and give talks about how they leverage the world&#8217;s premier open source search engine.</p>
<p>For me personally, I&#8217;m on to Vancouver and ApacheCon for two primary things, besides of course the community bits that go with every ApacheCon:</p>
<ol>
<li>Providing the ApacheCon&#8217;s first ever <a href="http://na11.apachecon.com/talks/18395">Apache Mahout training on Monday, November 7th</a>.  It&#8217;s still not too late to sign up!</li>
<li>Giving a talk on alternative uses of Lucene/Solr other than traditional free text search (things like recommendation engines, classification, etc.)</li>
</ol>
<p>For the 2nd item, I&#8217;m also interested in hearing from you, the user, about interesting things you&#8217;ve done with Lucene/Solr that fall outside the norm of free text search.  If you care to share, please leave a comment on this post.</p>
<p>I&#8217;d be remiss if I didn&#8217;t also plug several other Lucid Imagination employees who are speaking at ApacheCon as well:</p>
<ol>
<li><a href="http://na11.apachecon.com/talks/19453">Solr Flair</a> by Erik Hatcher.  Erik will also be doing a <a href="http://na11.apachecon.com/talks/19454">2 day Solr training class</a>.  Registration is still open for this class as well.</li>
<li><a href="http://na11.apachecon.com/talks/19346">Apache Solr: Out of the Box</a> by Chris Hostetter</li>
</ol>
<p>Lucid Imagination is also sponsoring the Lucene/Solr <a href="https://wiki.apache.org/lucene-java/ApacheCon2011NaMeetup">meetup</a> on Wed. November 9th, so if you are in town, please feel free to drop by for a drink and a chat.</p>
<p>With that, I&#8217;ll simply say, I hope to see you in Vancouver in a few weeks!</p>
<p>-Grant</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/10/22/barcelona-vancouver/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mahout in Action Review</title>
		<link>http://www.lucidimagination.com/blog/2011/10/15/mahout-in-action-review/</link>
		<comments>http://www.lucidimagination.com/blog/2011/10/15/mahout-in-action-review/#comments</comments>
		<pubDate>Sat, 15 Oct 2011 13:13:18 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[Grant Ingersoll]]></category>
		<category><![CDATA[Mahout in Action]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4317</guid>
		<description><![CDATA[<p>You know your (technical) <a href="https://cwiki.apache.org/confluence/display/MAHOUT/MahoutName">baby</a> is (almost) grown up when the book on the project finally comes out.  Such is the case for Apache Mahout, thanks to <a href="http://www.manning.com">Manning Publications</a> shipping <a href="http://affiliate.manning.com/idevaffiliate.php?id=1141_219">Mahout in Action</a> this week.</p>
<p><img src="http://manning.com/owen/owen_cover150.jpg" alt="" width="150" height="187" class="alignright" float="right" />So, before I start into my review, let me first say congratulations to Sean, Robin, Ted, Ellen and Manning for producing such an excellent product.   The simplest praise I can give it is to put it on the same &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>You know your (technical) <a href="https://cwiki.apache.org/confluence/display/MAHOUT/MahoutName">baby</a> is (almost) grown up when the book on the project finally comes out.  Such is the case for Apache Mahout, thanks to <a href="http://www.manning.com">Manning Publications</a> shipping <a href="http://affiliate.manning.com/idevaffiliate.php?id=1141_219">Mahout in Action</a> this week.</p>
<p><img src="http://manning.com/owen/owen_cover150.jpg" alt="" width="150" height="187" class="alignright" float="right" />So, before I start into my review, let me first say congratulations to Sean, Robin, Ted, Ellen and Manning for producing such an excellent product.   The simplest praise I can give it is to put it on the same level as one of the best intro to technology books I know:  <a href="http://www.manning.com/affiliate/idevaffiliate.php?id=1071_147">Lucene In Action</a>.  In other words, it sets the standard by which all other Mahout books will be judged.<br />
<br />
As for the actual book, it is broken down into 3 sections, which I like to call the &#8220;three C&#8217;s&#8221;:</p>
<ol>
<li>Collaborative Filtering</li>
<li>Clustering</li>
<li>Classification</li>
</ol>
<p>So, without further ado, let&#8217;s take a deeper look at the book in this context of the three C&#8217;s.</p>
<h2>Collaborative Filtering</h2>
<p>Collaborative Filtering is by far one of the most popular parts of Mahout, being used in places like <a href="https://cwiki.apache.org/confluence/display/MAHOUT/Powered+By+Mahout">Amazon and Foursquare</a> and this section of the book, via 5 chapters,  walks you nicely through both the concepts and the practical aspects of collaborative filtering.   Chapter 2 starts by getting you up and running using the <a href="http://www.grouplens.org/">GroupLens</a> dataset for movie recommendations.  For those unfamiliar with collaborative filtering, this makes for a nice entrance into the subject with data everyone can relate to easily.  Chapter 3 then discusses how to best model your data, while chapter 4 looks at the mechanics of actually generating recommendations from the data. </p>
<p>Chapters 5 and 6 then discuss the ins and outs of taking a recommendation engine into production, including details on how to scale it out using Apache Hadoop.  I found the explanation of the Hadoop based co-occurrence process (via RecommenderJob) especially useful, as I recently just committed <a href="https://issues.apache.org/jira/browse/MAHOUT-798">MAHOUT-798</a>, which uses it to build an example recommendation system based off of user interaction with email.  In fact, I relied heavily on all of the concepts in this part of the book, as I first had to extract and clean the data, then properly model it before finally running the recommendation task on EC2.</p>
<p>When I first got access to the MEAP for this book (quite some time ago), I did not have a lot of background in collaborative filtering and these chapters really helped fill in the practical details for me as well as provided a good foundation for the theoretical aspects behind collab. filtering.  I think this will serve others well who are looking to get started with collaborative filtering as well.</p>
<h2>Clustering</h2>
<p>Similar to collaborative filtering, the clustering section starts off by introducing the basic concepts and then quickly gets you up and running with an example clustering run.  Chapter 8 then gets into how best to do feature selection for clustering.  Feature selection is often one of the keys to successful clustering, so be sure to make sure you have a good grasp on the contents of the chapter before moving ahead into chapter 9, which gets into some of Mahout&#8217;s clustering algorithms.  That chapter primarily focuses on K-Means and Dirichlet, but also covers a few others.  Note, Mahout actually has a few other algorithms for clustering then the ones described, like spectral, canopy, meanshift and minhash.  Of course, some of these were added later in the book cycle, so it is hard to complain that they weren&#8217;t incorporated. Chapter 10 then covers, in my experience, one of the harder aspects of clustering, namely how to evaluate the results.  This chapter is a little bit thin, but it seems the overall field is the same, so this is not a put down on the chapter!  There simply isn&#8217;t a lot of great tools available for evaluating clustering.</p>
<p>Chapter 11 then adds some meat onto the bones of taking clustering to producti0n, including information on leveraging clustering in a Hadoop cluster.  Chapter 12 adds some nice concreteness to the sections by looking at clustering of real data sets from <a href="http://www.twitter.com">Twitter</a>, <a href="http://last.fm">Last.fm</a> and <a href="http://www.stackoverflow.com">Stack Overflow</a>.  For those looking to kick the tires with some real data, be sure to check out that chapter.</p>
<h2>Classification</h2>
<p>Classification is very popular these days both in search and beyond, so it is great to see this set of chapters covering the topic so well in practical, accessible terms.  As you would expect, the first chapter (13) gets you up and running as well as introduces the concepts of classification.  This chapter has a great explanation of how classification works and a typical workflow for building a classifier.</p>
<p>Chapter 14 then delves into the details of actually training a classifier using Mahout&#8217;s Stochastic Gradient Descent algorithm as well as it&#8217;s Bayesian classifier.</p>
<p><img src="http://1.bp.blogspot.com/_t0NJvKaO1dI/SjXBGm0DCpI/AAAAAAAAD1M/ISwdVEi7dt4/s400/potatosalad.jpg" alt="" width="243" height="320" class="alignright" float="right" />The next chapter then takes a look at how best to evaluate a classifier as well as some insight into what happens when a classifier goes bad.  Be sure to check this out, as you will no doubt run into many of the issues covered.  As an aside, I couldn&#8217;t help thinking of the classic &#8220;Far Side&#8221; cartoon to the right upon reading that section heading.The penultimate classification chapter digs into the practical aspects of deploying a classifier in production, including details on working through your scale and speed requirements.  It finishes off with an example Apache Thrift based server which some may find as a useful starting point for their applications.  Finally, Mahout in Action finishes off with a Case Study of how <a href="http://www.shopittome.com">Shop It To Me</a> uses a Mahout classifier to provide recommendations of offers to customers.  As with any technical book, it is great to have some concrete discussion of how this stuff functions in the wild.</p>
<h2>What&#8217;s Missing (i.e. When&#8217;s the 2nd edition coming out?)</h2>
<p>Mahout has a number of other interesting things that are in various stages of development like frequent patternset mining, Singular Value Decomposition (feature reduction), evolutionary programming, integration libraries for input/output as well as tools for storing data in Cassandra and Mongo.  Since Mahout is developing pretty quickly, the lack of this being in the book is no fault of the authors, I&#8217;m just putting it up here so that people are aware that Mahout has more to offer, even if the three &#8220;C&#8217;s&#8221; are the most popular.</p>
<p>All in all, Mahout in Action is an excellent introduction to the project.  Naturally I&#8217;m biased, but, pun intended, I highly recommend the book!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/10/15/mahout-in-action-review/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Happy Anniversary, Lucene!  10 years at the ASF</title>
		<link>http://www.lucidimagination.com/blog/2011/09/18/happy-anniversary-lucene-10-years-at-the-asf-3/</link>
		<comments>http://www.lucidimagination.com/blog/2011/09/18/happy-anniversary-lucene-10-years-at-the-asf-3/#comments</comments>
		<pubDate>Sun, 18 Sep 2011 18:05:38 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[nutch]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=4050</guid>
		<description><![CDATA[<p>From a quiet start as a pet project to a giant in the industry, <a href="http://lucene.apache.org">Apache Lucene</a> is definitely the little (search) engine that could.  On September 18th, 2001 (at 16:29:48 UTC) Jason Van Zyl made the first <a href="http://svn.apache.org/viewvc?view=revision&#38;revision=149570">official import</a> of Doug Cutting&#8217;s Lucene project (which started in 1997 and was hosted on SourceForge) into <a href="http://www.apache.org">Apache&#8217;s</a> Jakarta project (check out the <a href="http://web.archive.org/web/20011202174653/http://jakarta.apache.org/">Wayback machine</a>).</p>
<p>And while I wasn&#8217;t around in the beginning, I thought I would &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>From a quiet start as a pet project to a giant in the industry, <a href="http://lucene.apache.org">Apache Lucene</a> is definitely the little (search) engine that could.  On September 18th, 2001 (at 16:29:48 UTC) Jason Van Zyl made the first <a href="http://svn.apache.org/viewvc?view=revision&amp;revision=149570">official import</a> of Doug Cutting&#8217;s Lucene project (which started in 1997 and was hosted on SourceForge) into <a href="http://www.apache.org">Apache&#8217;s</a> Jakarta project (check out the <a href="http://web.archive.org/web/20011202174653/http://jakarta.apache.org/">Wayback machine</a>).</p>
<p>And while I wasn&#8217;t around in the beginning, I thought I would offer up some (little) known tidbits, links, etc. about Lucene as an ode to the search library that has significantly changed the search world, as well as my own career:</p>
<ol>
<li>Lucene was <a href="http://www.lucidimagination.com/devzone/videos-podcasts/podcasts/interview-doug-cutting">Doug&#8217;s way of learning Java</a>!  How&#8217;s that for a start?  It took him 3 months, working 2 days a week to crank out the first version.</li>
<li>At the time, some commercial search engines could not do incremental updates of the index, meaning you had to re-index all your documents anytime you had an update.  Lucene has always had an incremental model, all the way through to today&#8217;s Near Real Time features that power the likes of <a href="http://www.twitter.com">Twitter</a> at 1 billion+ searches and 100M+ new documents per day.</li>
<li><a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/document/Field.java?view=markup&amp;pathrev=149570">Field myField = Field.Text(&#8220;foo&#8221;, &#8220;bar&#8221;)</a>; anyone?  Or how about Field myField = Field.UnIndexed(&#8220;foo&#8221;, &#8220;bar&#8221;);</li>
<li>Back then, Lucene had it&#8217;s own <a href="http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/analysis/PorterStemmer.java?view=markup&amp;pathrev=149570">PorterStemmer</a>, now we just use <a href="http://snowball.tartarus.org">Snowball</a>.</li>
<li>Only 1 of the <a href="http://web.archive.org/web/20020213045032/http://jakarta.apache.org/lucene/docs/whoweare.html">original committers</a> still remains somewhat active.</li>
<li>Read the old <a href="http://web.archive.org/web/20020203084504/http://www.lucene.com/cgi-bin/faq/faqmanager.cgi">FAQ</a>!  True as it ever was.  (Mostly)</li>
<li>Lucene 2.3 drastically improved indexing performance thanks to a thorough overhaul of the innards while barely affecting the API.  4.0 will blow the doors off of previous versions in terms of speed and efficiency.</li>
<li>Lucene is Doug&#8217;s wife&#8217;s <a href="http://www.lucidimagination.com/devzone/videos-podcasts/podcasts/interview-doug-cutting">middle name</a>.</li>
<li>Lucene has evolved from offering a single vector space scoring model to one that now <a href="http://www.lucidimagination.com/blog/2011/09/12/flexible-ranking-in-lucene-4/">offers plug-n-play</a> ranking (BM25 anyone?)</li>
<li>Lucene is ubiquitous.  It powers search on everything from mobile devices to web scale engines.  I&#8217;ve seen indexes as small as 15% of the original content.  I&#8217;ve also seen indexes grow to several billion documents in size.  Lucene has been used as a caching store, an ORM, a cross language search engine, the guts of the popular <a href="http://lucene.apache.org/solr">Solr</a> search server, the retrieval engine for IBM&#8217;s <a href="http://www-03.ibm.com/innovation/us/watson/index.html">Watson</a> as well as several commercial search engines and pretty much everything in between.</li>
<li>Did you know <a href="http://hadoop.apache.org">Apache Hadoop</a> started as a subproject of Lucene?  Doug Cutting and Mike Cafarella first built out Hadoop in order to scale out indexing for the <a href="http://nutch.apache.org">Apache Nutch</a> project.  From there it was spun out to be a top level ASF project and has gone on to be the de facto choice for large scale distributed processing, much like Lucene is the de facto choice for search!  Lucene has also spun out <a href="http://mahout.apache.org">Mahout</a>, <a href="http://tika.apache.org">Tika</a>, Lucene.NET and Lucy!</li>
</ol>
<p>As for how Lucene&#8217;s impacted me?  In 2004, I took a job at the <a href="http://www.cnlp.org">Center for Natural Language Processing</a> at Syracuse University working for Dr. Liz Liddy.  My job was to build an Arabic-English cross language search engine.  Within a day or two of starting, <a href="http://www.linkedin.com/profile/view?id=10139209&amp;trk=tyah">Ozgur Yilmazel</a> (my boss at the time) said something to the effect of &#8220;we&#8217;ll be using Lucene for the implementation.  Go learn it.&#8221;  Digging in, I quickly needed a couple of features, the biggest one being Term Vectors, so I updated a patch from an earlier version of Lucene and managed to convince the committers at the time to commit it.  From there, I kept supplying patches.  Eventually, I was asked to be a committer.  Some time after that, Yonik Seeley and Marc Krellenstein approached a bunch of the committers about starting a company and here I am today at the company we (Erik, Yonik, Marc and I) founded back in 2007, <a href="http://www.lucidimagination.com">Lucid Imagination</a>.  I feel fortunate to have the opportunity to work on hard problems in an interesting field and for that, Lucene, in no small part, I thank  you.</p>
<p>But enough of my self-indulgence, how has Lucene impacted you?  When did you first start using it?  What&#8217;s your biggest index or fastest QPS?   What ways have you used Lucene beyond that of a search engine?  Leave a comment and let us know.</p>
<p>Happy 10th Anniversary, Lucene!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/09/18/happy-anniversary-lucene-10-years-at-the-asf-3/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Putting your search skills to the test: Lucid Certified Apache Solr/Lucene Developer Program</title>
		<link>http://www.lucidimagination.com/blog/2011/05/12/putting-your-search-skills-to-the-test-lucid-certified-apache-solrlucene-developer-program/</link>
		<comments>http://www.lucidimagination.com/blog/2011/05/12/putting-your-search-skills-to-the-test-lucid-certified-apache-solrlucene-developer-program/#comments</comments>
		<pubDate>Thu, 12 May 2011 13:01:58 +0000</pubDate>
		<dc:creator>David M. Fishman</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open Source]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3438</guid>
		<description><![CDATA[<p>One of the singular qualities of search technology is its breadth: if it&#8217;s been written down (albeit digitally), you can search it, and if you can search it, you can build a search app for it. That&#8217;s part of what makes Solr/Lucene so alluring for application development &#8212; you can build it to search just about anything, for anyone, in any way. Inspiring breadth, however, can be pretty daunting to master.</p>
<p>How, then, can you &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>One of the singular qualities of search technology is its breadth: if it&#8217;s been written down (albeit digitally), you can search it, and if you can search it, you can build a search app for it. That&#8217;s part of what makes Solr/Lucene so alluring for application development &#8212; you can build it to search just about anything, for anyone, in any way. Inspiring breadth, however, can be pretty daunting to master.</p>
<p>How, then, can you know how much you know about search with Solr and Lucene? In the world of Apache open source, there&#8217;s <a href="http://apache.org/foundation/how-it-works.html#meritocracy">a clear meritocracy</a> of peer review: contributors, committers, and active membership in the PMC. In theory, it&#8217;s a distinction anyone of sufficient talent and single-minded focus can achieve &#8212; just like anyone of sufficient talent and single-minded focus can make it to the NBA, or win the Nobel prize, or join the New York Philharmonic.</p>
<p>So you probably know your stuff if you&#8217;ve won the Nobel prize, made the NBA, or played the solo for <a href="http://en.wikipedia.org/wiki/Clarinet_Concerto_%28Mozart%29">Mozart&#8217;s Clarinet Concerto</a> at <a href="http://www.barrypopik.com/index.php/new_york_city/entry/how_do_you_get_to_carnegie_hall/">Carnegie Hall</a>, or you&#8217;re a Lucene/Solr contributor-or-committer. But what if you have not done any of those things, how do you know you know? Equally important, how do your peers or potential employers know how well you know your open source search stuff?</p>
<p>While there are more professional basketball players than Lucene/Solr committers, there are many, many more capable, talented, experienced Solr/Lucene application developers who are not going to &#8216;go pro&#8217; in the Apache meritocracy. And the demand for Solr application development skills is exploding as interest and uptake of the leading open source application development technology spread like wildfire through organizations large and small. (<a href="http://lucenerevolution.com/">Lucene Revolution, May 25-26 in San Francisco</a>, will be packed with these people &#8212; <a href="http://us.ootoweb.com/luceneregistration">sign up today</a> if you haven&#8217;t already. And read on for another special  opportunity at Lucene Revolution).</p>
<p><a href="http://www.lucidimagination.com/blog/wp-content/uploads/2011/05/CertificationLogo.png"><img class="alignright size-medium wp-image-3468" title="CertificationLogo" src="http://www.lucidimagination.com/blog/wp-content/uploads/2011/05/CertificationLogo-300x287.png" alt="" width="210" height="201" /></a>It&#8217;s exactly for the broad base of interested, committed search application developers that <a href="http://www.lucidimagination.com/About/Company-News/Lucid-Imagination-Launches-Certification-Program-Apache-SolrLucene-Developers">today we&#8217;re introducing</a> the <a href="http://www.lucidimagination.com/certification">Lucid Certified Apache Solr/Lucene Developer Program</a>; a certification exam designed to benchmark development skills and experience in building applications with Apache Solr.</p>
<p>Designed with Prometric and a team of subject matter experts comprised of Apache Lucene/Solr committers, developers, and trainers, the test is <a href="http://www.lucidimagination.com/certification/FAQ#a6">designed to rigorously assess </a>a broad base of search skills and experience, and provide the closest reasonable approximation possible to a standard measure of  search skills and experience.  It&#8217;s delivered via <a href="http://www.prometric.com/lucid/default.htm">Prometric.com</a>, consists of multiple choice questions, and costs $250. The test reflects a carefully selected, broad range of topics intended to reflect the real-world challenges and landscape of search application development problems, which <a href="http://www.lucidimagination.com/topics">you can see here</a>.</p>
<p><a href="http://www.opensourceconnections.com/">Eric Pugh</a>, who <a href="http://www.packtpub.com/solr-1-4-enterprise-search-server/book">wrote the book on Solr</a>, says this:</p>
<blockquote><p>“I  expect that the Lucid Imagination certification will quickly  become  the gold standard benchmark for whether someone who claims Solr  and  Lucene expertise truly possesses it. Oftentimes, a buyer of services   has to take the leap of faith from sales pitch to execution that the   knowledge is truly there. This certification can show, without a doubt,   that the holder truly has the knowledge required to deliver a  successful  Solr/Lucene implementation. In the open source world, there  are very  few marks of authenticity: committer status, published author,  and now  the Lucid Solr certification. Just as the CPA certification  shows a high  level of knowledge and ability in the accounting industry,  the Lucid  Imagination Solr certification demonstrates unquestionable  knowledge and  experience in successful Solr/Lucene search engine  implementation.”</p></blockquote>
<p>It&#8217;s important to be clear about what the certification is <strong>not:</strong></p>
<ul>
<li>It&#8217;s not easy: don&#8217;t expect to take your first Solr course one day and pass the exam the next.</li>
<li>It&#8217;s not a substitute for experience: if you&#8217;ve only built one Solr application, earlier this morning, using the wiki demo that runs locally in your browser, you won&#8217;t pass.</li>
<li>It&#8217;s not a substitute for training: taking a class from an expert may not be sufficient, but it will really help (and <a href="http://training.lucidimagination.com">we offer the most professional-grade courses</a> available; did I mention <a href="http://www.lucenerevolution.org/training">Lucene Revolution has classes available</a>, too?)</li>
<li>It&#8217;s not a casual conceptual overview: expect to answer detailed questions on everything from Lucene fundamentals to Solr debug output.</li>
<li>It&#8217;s not a simple checklist of facts: you&#8217;ll have to demonstrate judgement calls in identifying correct answers to topic areas tied to searching, indexing, deployment, data source types, etc.</li>
</ul>
<p>Testing as a pedagogical method &#8212; a mechanism for driving learning &#8212; is not the be-all-end-all of education (you probably didn&#8217;t think highly  of classmates who asked the teacher, &#8220;Will this be on the test?&#8221;). But it turns out that tests can have <a href="http://www.nytimes.com/2011/01/21/science/21memory.html">a salutary impact on acquiring and retaining knowledge</a>, according to <a href="http://www.sciencemag.org/content/early/2011/01/19/science.1199327.abstract">a recent article in Science</a>.</p>
<p>We expect that this test will help level the playing field for a broad range of application developers to acquire and prove their Solr/Lucene application development skills &#8212; and help employers who want to take full advantage of the best open-source search technology on the planet find the men and women who have the stuff to do it.</p>
<p>If you&#8217;re coming to Lucene Revolution, the exam will be offered there for free &#8212; a savings of $250 over the regular price. Details are <a href="http://lucenerevolution.com/2011/certification">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/05/12/putting-your-search-skills-to-the-test-lucid-certified-apache-solrlucene-developer-program/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache Lucene 3.1.0 and Apache Solr 3.1.0</title>
		<link>http://www.lucidimagination.com/blog/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/</link>
		<comments>http://www.lucidimagination.com/blog/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/#comments</comments>
		<pubDate>Thu, 31 Mar 2011 18:33:54 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3268</guid>
		<description><![CDATA[<p>It&#8217;s official, Apache Lucene 3.1.0 and Apache Solr 3.1.0 are officially released.  Keep an eye here for more on the new features and functionality.</p>
<p>Here&#8217;s the release announcements as just sent to the mailing lists:</p>
<blockquote>
<pre>March 2011, Apache Lucene 3.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 3.1.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate </pre>&#8230;</blockquote>]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s official, Apache Lucene 3.1.0 and Apache Solr 3.1.0 are officially released.  Keep an eye here for more on the new features and functionality.</p>
<p>Here&#8217;s the release announcements as just sent to the mailing lists:</p>
<blockquote>
<pre>March 2011, Apache Lucene 3.1 available
The Lucene PMC is pleased to announce the release of Apache Lucene 3.1.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release
is available for immediate download at http://www.apache.org/dyn/closer.cgi/lucene/java.
See the CHANGES.txt
file included with the release for a full list of details.

Lucene 3.1 Release Highlights

* Numerous performance improvements: faster exact PhraseQuery; merging
 favors segments with deletions; primary key lookup is faster;
 IndexWriter.addIndexes(Directory[]) uses file copy instead of
 merging; various Directory performance improvements; compound file
 is dynamically turned off for large segments; fully deleted segments
 are dropped on commit; faster snowball analyzers (in contrib);
 ConcurrentMergeScheduler is more careful about setting priority of
 merge threads.

* ReusableAnalyzerBase makes it easier to reuse TokenStreams
 correctly.

* Improved Analysis capabilities: Improved Unicode support, including
 Unicode 4, more friendly term handling (CharTermAttribute), easier
 object reuse and better support for protected words in lossy token
 filters (e.g. stemmers).

* ConstantScoreQuery now allows directly wrapping a Query.

* IndexWriter is now configured with a new separate builder API,
 IndexWriterConfig.  You can now control IndexWriter's previously
 fixed internal thread limit by calling setMaxThreadStates.

* IndexWriter.getReader is replaced by IndexReader.open(IndexWriter).
 In addition you can now specify whether deletes should be resolved
 when you open an NRT reader.

* MultiSearcher is deprecated; ParallelMultiSearcher has been
 absorbed directly into IndexSearcher.

* On 64bit Windows and Solaris JVMs, MMapDirectory is now the
 default implementation (returned by FSDirectory.open).
 MMapDirectory also enables unmapping if the JVM supports it.

* New TotalHitCountCollector just counts total number of hits.

* ReaderFinishedListener API enables external caches to evict entries
 once a segment is finished.

March 2011, Apache Solr 3.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.1.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below.  The release is
available for immediate download at http://www.apache.org/dyn/closer.cgi/lucene/solr.
See the CHANGES.txt file included with the release for a full list of
details as well as instructions on upgrading.

What's in a Version? 

The version number for Solr 3.1 was chosen to reflect the merge of
development with Lucene, which is currently also on 3.1.  Going
forward, we expect the Solr version to be the same as the Lucene
version.  Solr 3.1 contains Lucene 3.1 and is the release after Solr 1.4.1.

Solr 3.1 Release Highlights

* Numeric range facets (similar to date faceting).

* New spatial search, including spatial filtering, boosting and sorting capabilities.

* Example Velocity driven search UI at http://localhost:8983/solr/browse

* A new termvector-based highlighter

* Extend dismax (edismax) query parser which addresses some
 missing features in the dismax query parser along with some
 extensions.

* Several more components now support distributed mode:
 TermsComponent, SpellCheckComponent.

* A new Auto Suggest component.

* Ability to sort by functions.

* JSON document indexing

* CSV response format

* Apache UIMA integration for metadata extraction

* Leverages Lucene 3.1 and it's inherent optimizations and bug fixes
 as well as new analysis capabilities.

* Numerous improvements, bug fixes, and optimizations.
</pre>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/03/31/apache-lucene-3-1-0-and-apache-solr-3-1-0/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Some Lucene/Solr Freebies</title>
		<link>http://www.lucidimagination.com/blog/2011/03/07/some-lucenesolr-freebies/</link>
		<comments>http://www.lucidimagination.com/blog/2011/03/07/some-lucenesolr-freebies/#comments</comments>
		<pubDate>Mon, 07 Mar 2011 22:42:52 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=3047</guid>
		<description><![CDATA[<p>While we make some of our money off of professional services and support of Apache Lucene and Solr, I thought I would pass along a few freebies when it comes to improving your Lucene or Solr application.  These are things that we usually end up telling most clients at some stage of the game.  Many of them fall under the &#8220;<a href="http://www.artima.com/intv/fixit2.html">broken windows</a>&#8221; theory of software development, so don&#8217;t expect anything too earth shattering.&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>While we make some of our money off of professional services and support of Apache Lucene and Solr, I thought I would pass along a few freebies when it comes to improving your Lucene or Solr application.  These are things that we usually end up telling most clients at some stage of the game.  Many of them fall under the &#8220;<a href="http://www.artima.com/intv/fixit2.html">broken windows</a>&#8221; theory of software development, so don&#8217;t expect anything too earth shattering.</p>
<ol>
<li>The Solr example is just that, an example.  If you haven&#8217;t cleaned up the example fields and other stuff it tells me you haven&#8217;t thought much about your domain and how best to represent it in search.  I&#8217;d say it&#8217;s safe to say a good chunk of our consulting work is about helping people better understand how to represent their domain in the search engine.  Take the time to think about what field types you are using, what analysis that field type does and how best to represent that content for searching, faceting, sorting, etc.  For instance, Porter/Snowball stemming is often quite aggressive, is that really what you want?  In other words, know your analysis.</li>
<li>Be prepared to state what each field is and why it is indexed/stored/multivalued/etc.  See <a href="http://wiki.apache.org/solr/FieldOptionsByUseCase">http://wiki.apache.org/solr/FieldOptionsByUseCase</a>.  This holds true for Lucene, too.</li>
<li>Likewise in the solrconfig.xml, remove ununsed/example RequestHandlers, etc.  Also, some of the configurations of Request Handlers, specifically the Spelling one, are for demo purposes.  Read the comments in the file to make sure you understand how to use it.  Cleaning up the config and the schema will make it easier to maintain these as well as help getting new devs up to speed.</li>
<li>Lucene users: faceting by loading the values from stored fields and counting them for every search is a no-no performance-wise.</li>
<li>Really large values for the Solr cache sizes is (usually) considered harmful.  Be prepared to justify why you have a filterCache size of 500K items.  Bigger isn&#8217;t always better.</li>
<li><a href="http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Search-Application-Relevance-Issues">Ad hoc relevance testing</a> is pretty close to having no relevance testing.  If you don&#8217;t know what queries work, how do you know if your users are succeeding?</li>
<li>You should always know your top X queries, top Y query terms and any queries that return zero results, at least on a daily or weekly basis. <a href="http://www.lucidimagination.com/lwe/download"> LucidWorks Enterprise</a> builds this into the dashboard if you&#8217;d like to save yourself some time writing the log analysis code and still want to leverage all the power of Lucene/Solr.   FWIW, By you, I mean anyone on your team who has a stake in search, including engineers, business owners and QA.  Not knowing what your users are asking is a recipe for poor search.</li>
<li>NFS = bad for search (usually).  It&#8217;s all right to store the index there, but you shouldn&#8217;t read/write from it (it will work, it just won&#8217;t perform, in my experience)</li>
<li>Buy <a href="http://www.lucidimagination.com/developers/documentation/books">Lucene in Action</a>.  Read it.</li>
<li>In Solr, the fastest way to index is by using multiple client threads sending batches of documents at a time.  In Lucene, the IndexWriter is thread-safe, so share it across threads as well for writes.</li>
<li>Bonus: Don&#8217;t re-invent the wheel.  I&#8217;ve seen a lot of good Lucene applications and a fair number of poorly written ones.  I&#8217;ve also seen a lot of Lucene applications that look just like Solr, but aren&#8217;t nearly as well written or as well tested.  Ask yourself, do I really need to maintain low-level Lucene code for managing IndexReaders?  Do I really need to implement my own Faceting code?  My own Replication/distributed search?  If not, consider Solr.  It really isn&#8217;t that hard to switch.  If you answer yes, great, I&#8217;d love to hear your use case.  In either case, we&#8217;d be happy to <a href="http://www.lucidimagination.com/enterprise-search-solutions/search-solutions-services">review your situation and give you best practices</a>, all the way down to the code level, if need be.</li>
</ol>
<p>Like I said, nothing too earth shattering, but I do come across many of these issues on a regular basis.  Also, note these are based on my experience, your mileage may vary.  Naturally, I&#8217;d love to hear your tips as well!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2011/03/07/some-lucenesolr-freebies/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache Lucene 2.9.4 and 3.0.3 Released</title>
		<link>http://www.lucidimagination.com/blog/2010/12/03/apache-lucene-2-9-4-and-3-0-3-released/</link>
		<comments>http://www.lucidimagination.com/blog/2010/12/03/apache-lucene-2-9-4-and-3-0-3-released/#comments</comments>
		<pubDate>Fri, 03 Dec 2010 13:02:37 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Lucene]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2759</guid>
		<description><![CDATA[<p>The Apache Lucene community has just released versions 2.9.4 and 3.0.3.  Here is the release announcement:</p>
<blockquote><p>Both releases fix bugs in the previous versions:</p>
<ul>
<li> <a href="http://lucene.apache.org/java/2_9_4/">2.9.4</a> is a bugfix release for the Lucene Java 2.x series, based on Java 1.4.</li>
<li> <a href="http://lucene.apache.org/java/3_0_3/">3.0.3</a> has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5.</li>
</ul>
<p>New users of Lucene are advised to use version 3.0.3 for new developments, because it has a </p>&#8230;</blockquote>]]></description>
			<content:encoded><![CDATA[<p>The Apache Lucene community has just released versions 2.9.4 and 3.0.3.  Here is the release announcement:</p>
<blockquote><p>Both releases fix bugs in the previous versions:</p>
<ul>
<li> <a href="http://lucene.apache.org/java/2_9_4/">2.9.4</a> is a bugfix release for the Lucene Java 2.x series, based on Java 1.4.</li>
<li> <a href="http://lucene.apache.org/java/3_0_3/">3.0.3</a> has the same bug fix level but is for the Lucene Java 3.x series, based on Java 5.</li>
</ul>
<p>New users of Lucene are advised to use version 3.0.3 for new developments, because it has a clean, type-safe API.</p>
<p><strong>This release contains numerous bug fixes and improvements since 2.9.3 / 3.0.2, including:</strong></p>
<ul>
<li>a memory leak in IndexWriter exacerbated by frequent commits</li>
<li>a file handle leak in IndexWriter when near-real-time readers are opened with compound file format enabled</li>
<li>a rare index corruption case on disk full</li>
<li> NumericRangeQuery / NumericRangeFilter sometimes returning incorrect results             with bounds near Long.MIN_VALUE and Long.MAX_VALUE</li>
<li>various thread safety issues</li>
<li>Lucene 2.9.4 can now also read indexes created by 3.0.x</li>
</ul>
<p>Both releases are fully compatible with the corresponding  previous versions. We strongly recommend upgrading to 2.9.4 if you are  using 2.9.x; and to 3.0.3 if you are using 3.0.x.</p>
<p>See <a href="http://lucene.apache.org/java/3_0_3/changes/Changes.html">3.0.3 CHANGES</a> and           <a href="http://lucene.apache.org/java/2_9_4/changes/Changes.html">2.9.4 CHANGES</a> for details.           <strong>Binary and source distributions are available           <a href="http://www.apache.org/dyn/closer.cgi/lucene/java/">here</a>.</strong> Maven artifacts are available            <a href="http://repo1.maven.org/maven2/org/apache/lucene/">here</a>.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/12/03/apache-lucene-2-9-4-and-3-0-3-released/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>OpenNLP moving to Apache</title>
		<link>http://www.lucidimagination.com/blog/2010/12/02/opennlp-moving-to-apache/</link>
		<comments>http://www.lucidimagination.com/blog/2010/12/02/opennlp-moving-to-apache/#comments</comments>
		<pubDate>Fri, 03 Dec 2010 02:12:51 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[opennlp]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2756</guid>
		<description><![CDATA[<p>For those of you who are looking to enhance your search (and other applications) capabilities with Natural Language Processing capabilities like named entity extraction, parsing, sentence detection and others techniques, do take note that the long standing Sourceforge project, OpenNLP, has entered incubation at the <a href="http://www.apache.org">Apache Software Foundation</a>.  I think this is a great move for the project as it will no doubt bring greater contributions and attention from a wider audience leading to &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>For those of you who are looking to enhance your search (and other applications) capabilities with Natural Language Processing capabilities like named entity extraction, parsing, sentence detection and others techniques, do take note that the long standing Sourceforge project, OpenNLP, has entered incubation at the <a href="http://www.apache.org">Apache Software Foundation</a>.  I think this is a great move for the project as it will no doubt bring greater contributions and attention from a wider audience leading to many new enhancements and bug fixes.</p>
<p>We are still in the very early stages of moving things over from Sourceforge, but do keep your eyes on the Apache Incubator site and the OpenNLP site for news of when the new site is live and when the code has officially moved.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/12/02/opennlp-moving-to-apache/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

