<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; Lucene Connector Framework</title>
	<atom:link href="http://www.lucidimagination.com/blog/category/lucene-connector-framework/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>The Apache Lucene Ecosystem: My View of 2010</title>
		<link>http://www.lucidimagination.com/blog/2010/12/27/the-apache-lucene-ecosystem-my-view-of-2010/</link>
		<comments>http://www.lucidimagination.com/blog/2010/12/27/the-apache-lucene-ecosystem-my-view-of-2010/#comments</comments>
		<pubDate>Mon, 27 Dec 2010 15:54:11 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[Droids]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucene Connector Framework]]></category>
		<category><![CDATA[LucidWorks]]></category>
		<category><![CDATA[Lucy]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[ManifoldCF]]></category>
		<category><![CDATA[nutch]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[PyLucene]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Tika]]></category>
		<category><![CDATA[ZooKeeper]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2809</guid>
		<description><![CDATA[<p>After a week off to enjoy time with my family, I thought I would kick off the last week of 2010 with a look back at the year as it relates to the Apache Lucene ecosystem.  For anyone who follows the amalgamation of projects that I like to call the Lucene Ecosystem (the Apache projects: Lucene, Solr, Nutch, Mahout, Tika, PyLucene, Lucy, Lucene.NET, Droids, ManifoldCF &#8212; Lucene Connector Framework, OpenNLP and UIMA) you know it &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>After a week off to enjoy time with my family, I thought I would kick off the last week of 2010 with a look back at the year as it relates to the Apache Lucene ecosystem.  For anyone who follows the amalgamation of projects that I like to call the Lucene Ecosystem (the Apache projects: Lucene, Solr, Nutch, Mahout, Tika, PyLucene, Lucy, Lucene.NET, Droids, ManifoldCF &#8212; Lucene Connector Framework, OpenNLP and UIMA) you know it has been an amazingly busy and fruitful year.  Instead of going through each project like <a href="http://www.lucidimagination.com/blog/2009/12/24/the-apache-lucene-ecosystem-my-view-of-2009/">last year&#8217;s review</a>, I&#8217;m just going to be a bit less formal and hit on the highlights as I see them.</p>
<p>Before I dig in too much, though, a special thanks to all our customers at Lucid Imagination as well as to my coworkers.  I&#8217;m coming up on 15 years out in the &#8220;real world&#8221; and I can honestly say I&#8217;ve never enjoyed what I do as much as I do here and that even accounts for the normal rough patches one goes through in any job.  As an engineer, there are few things as cool as getting to work with customers who are not only using, but pushing your work/project/product on a daily basis to do new and interesting things (I think this is a direct result of the project being Open Source, which I believe has an inherently <a href="http://www.lucidimagination.com/blog/2009/04/20/lucene-open-source-and-the-cost-of-experimentation/">lower cost of experimentation</a>).  I&#8217;ve been fortunate enough to meet and talk with many people doing all kinds of things with Lucene and Solr ranging from the &#8220;mundane&#8221; of basic keyword search to those building next generation search capabilities at incredible scale.  Through it all, I&#8217;m constantly amazed at the flexibility and efficiency of Lucene and Solr.  For instance, I&#8217;ve been working with one customer now whose Solr-based solution (for the exact same content) will use ~50% less hardware and will have an index that is 1/6 the size of their FAST index all while saving them major dinero.</p>
<p>Speaking of Lucid, one of the highlights of the year for us that relates directly to Lucene and Solr is the launch of our enterprise version: <a href="http://www.lucidimagination.com/lwe/download">LucidWorks Enterprise</a>.   I like to think of it as Apache Solr with a whole lot of Lucid expertise on how to use Solr baked in and topped off with other features and functionality to make building search applications easier.</p>
<p>OK, time to move on to the open source projects&#8230;</p>
<ol>
<li>Without a doubt, the biggest news of the year is the merging of the Lucene and Solr code base as well as the &#8220;graduation&#8221; of several subprojects to Apache Soft. Foundation Top Level Projects (TLP).  The graduating projects are <a href="http://tika.apache.org">Tika</a>, <a href="http://nutch.apache.org">Nutch</a>, and <a href="http://mahout.apache.org">Mahout</a>.  We also spun Lucy (a C port) to the Incubator, where it is working on it&#8217;s own community.  These moves were primarily done to focus the project management on single code base, but they also demonstrate the project has reached a level of maturity at the ASF.  The move also has the side benefit of bringing each project higher visibility.</li>
<li>I&#8217;m particularly excited about the addition of <a href="http://www.lucidimagination.com/blog/2010/12/02/opennlp-moving-to-apache/">OpenNLP to the Apache</a> umbrella.  OpenNLP is a nice open source Java project for natural language processing that has lived at Source Forge for quite some time.  I would expect development to grow quite a bit under the ASF community based model.  Also, integrating OpenNLP with Solr and Lucene is pretty easy to do.  I would be remiss if I didn&#8217;t also give a nod to the addition of the <a href="http://incubator.apache.org/connectors">ManifoldCF</a> project to the ASF.  ManifoldCF will help unlock content in Sharepoint, Documentum and other repositories for users of Lucene and Solr.</li>
<li>Lucene&#8217;s trunk code base now implements our &#8220;Flex APIs&#8221;, which should allow users to have near total control over what goes in the index as well as alternate compression techniques, different scoring models, etc.  See Michael McCandless&#8217; excellent <a href="http://www.lucidimagination.com/files/file/LuceneRev_McCandless_FunWithFlex.pdf">talk at Lucene Revolution</a> for more details.</li>
<li>With all the location aware devices and capabilities on the market, geo-spatial search is a hot topic and Lucene and Solr have been adding quite a bit of capabilities in this regard with the ability to filter, boost and sort results based on location information in documents.  See Solr&#8217;s <a href="http://wiki.apache.org/solr/SpatialSearch">Spatial Search Wiki page</a> for more info as well as several of my <a href="http://www.lucidimagination.com/search/?q=spatial#/s:lucid/li:blogs">past blog posts</a>.</li>
<li>Of course, everyone was a buzz about the cloud this year.  For Solr, this translates into greater efforts to make Solr easier to scale to very large installations (100s to 1000s of nodes and billions and billions of documents) via the <a href="http://wiki.apache.org/solr/SolrCloud">Solr Cloud project that Yonik Seeley and Mark Miller have been spearheading</a>.</li>
<li>On the user side, one of the biggest pieces of buzz this year related to Lucene was the migration of Twitter search to Lucene.  At 1 billion queries per day and 50 million posts per day (all indexed and searchable in near real time), Twitter&#8217;s search system certainly has it&#8217;s work cut out for itself.  However, as Michael Busch <a href="http://www.lucidimagination.com/events/revolution2010/videos/mbusch">outlined at Lucene Revolution</a>, Apache Lucene was up to the task!  Naturally, there were lots of other companies that migrated to Solr and Lucene as well.  Have you <a href="http://www.lucidimagination.com/enterprise-search-solutions/case-studies">shared your use case</a>?</li>
</ol>
<p>Well, I&#8217;ve no doubt missed a bunch of other things, but those items, to me, are some of the bigger highlights.  Looking forward, there are some other exciting things coming to Lucene and Solr.  In particular, I&#8217;m working on adding language identification, related searches and point in polygon filtering to Solr.  I would also expect we will release Lucene/Solr 3.1 fairly soon, too, but you can&#8217;t pin me down on a date just yet.</p>
<p>Here&#8217;s hoping you all have a Happy Holidays and a Happy New Year!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/12/27/the-apache-lucene-ecosystem-my-view-of-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Apache Lucene EuroCon Agenda &#8211; The Revolution is On!</title>
		<link>http://www.lucidimagination.com/blog/2010/04/22/apache-lucene-eurocon-agenda-the-revolution-is-on/</link>
		<comments>http://www.lucidimagination.com/blog/2010/04/22/apache-lucene-eurocon-agenda-the-revolution-is-on/#comments</comments>
		<pubDate>Thu, 22 Apr 2010 11:09:33 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucene Connector Framework]]></category>
		<category><![CDATA[Lucid Imagination]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[nutch]]></category>
		<category><![CDATA[Open Relevance Project]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Tika]]></category>
		<category><![CDATA[ZooKeeper]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1965</guid>
		<description><![CDATA[<p>After reviewing a lot of great talk proposals, we&#8217;ve announced the  agenda for Apache Lucene Eurocon: <a href="http://lucene-eurocon.org/agenda.html">Apache Lucene EuroCon &#8211;  Europe&#8217;s Premier Lucene and Solr Search User Conference</a>.</p>
<p>One  of the things I really like about this agenda is it is a great mix of  basics, use cases from all over the search map (CMS, news, social media,  advertising), business decisions (see last list and next list) and advanced topics  (NLP, collab filtering, machine &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>After reviewing a lot of great talk proposals, we&#8217;ve announced the  agenda for Apache Lucene Eurocon: <a href="http://lucene-eurocon.org/agenda.html">Apache Lucene EuroCon &#8211;  Europe&#8217;s Premier Lucene and Solr Search User Conference</a>.</p>
<p>One  of the things I really like about this agenda is it is a great mix of  basics, use cases from all over the search map (CMS, news, social media,  advertising), business decisions (see last list and next list) and advanced topics  (NLP, collab filtering, machine learning, advanced visualization, multilingual).   Moreover, the content, even though it is centered in Lucene, goes well  beyond just being about Lucene and is really about search, in all of it&#8217;s power and  glory.  It&#8217;s about real users with real needs getting real problems  solved using the Lucene ecosystem.  Oh, and by the way, those users are doing it at scale!  Big scale.</p>
<p>That&#8217;s powerful stuff,  because, in case you hadn&#8217;t noticed (shh, it&#8217;s our little secret) there is a revolution going on in search.  (Funny how that line coincides with Lucid&#8217;s frontman, Eric Gries, giving  a  talk titled &#8220;The Search Revolution&#8221;)</p>
<p>Are you a part of the revolution?  See you in <a href="http://lucene-eurocon.org/index.html">Prague</a> in mid-May.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/04/22/apache-lucene-eurocon-agenda-the-revolution-is-on/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Apache Lucene Connector Framework now in Incubation at the ASF</title>
		<link>http://www.lucidimagination.com/blog/2010/01/20/apache-lucene-connector-framework-now-in-incubation-at-the-asf/</link>
		<comments>http://www.lucidimagination.com/blog/2010/01/20/apache-lucene-connector-framework-now-in-incubation-at-the-asf/#comments</comments>
		<pubDate>Wed, 20 Jan 2010 20:16:13 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Lucene Connector Framework]]></category>
		<category><![CDATA[Mahout]]></category>
		<category><![CDATA[nutch]]></category>
		<category><![CDATA[PyLucene]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[Tika]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1509</guid>
		<description><![CDATA[<h1>Short Version</h1>
<p>The Apache Lucene Connector Framework project has officially entered incubation.  LCF, for short, is going to be a framework for connecting to content repositories like Sharepoint, Documentum, etc. and will make it easy to hook into Lucene, Solr, Nutch, Mahout, Tika, while, of course, remaining agnostic of the final destination of the data.  See the <a href="http://incubator.apache.org/connectors/">Connectors website</a> and the <a href="http://wiki.apache.org/incubator/LuceneConnectorFrameworkProposal">original proposal</a> for more info.  Help wanted!</p>
<h1>Long Version</h1>
<h2>Background</h2>
<p>A while back, <a href="http://www.metacarta.com">MetaCarta</a>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<h1>Short Version</h1>
<p>The Apache Lucene Connector Framework project has officially entered incubation.  LCF, for short, is going to be a framework for connecting to content repositories like Sharepoint, Documentum, etc. and will make it easy to hook into Lucene, Solr, Nutch, Mahout, Tika, while, of course, remaining agnostic of the final destination of the data.  See the <a href="http://incubator.apache.org/connectors/">Connectors website</a> and the <a href="http://wiki.apache.org/incubator/LuceneConnectorFrameworkProposal">original proposal</a> for more info.  Help wanted!</p>
<h1>Long Version</h1>
<h2>Background</h2>
<p>A while back, <a href="http://www.metacarta.com">MetaCarta</a>, a spatial search company, approached us about open sourcing their internally developed Connector Framework at the <a href="http://www.apache.org">Apache Software Foundation</a>.  After several discussions and a whole bunch of legwork getting a proposal together, the LCF is now officially launched in the <a href="http://incubator.apache.org/">Apache Incubator</a>!  We&#8217;ve already got a great roster of committers lined up and are working to incorporate the software grant from MetaCarta, from which we can build out a first release, so stay tuned!  Lucid Imagination, of course, is a big supporter of this project and we look forward to it&#8217;s success!</p>
<h2>What is a Connector Framework?</h2>
<p>To quote the proposal:</p>
<blockquote><p>[The Lucene] Connector Framework is an extendible [sic] incremental crawler, which uses a database to manage configuration and crawl history, and provides reasonably high performance in accessing content in multiple repositories for the main purpose of search engine indexing. Connector Framework also establishes a repository-specific security model which can be used to limit search user access to repository content based on a user&#8217;s identity. Connector Framework also includes existing connectors and authorities for:</p>
<ul>
<li>File system</li>
<li>Windows shares</li>
<li>JDBC-supported databases</li>
<li>RSS feeds</li>
<li>General websites</li>
<li>LiveLink [from OpenText]</li>
<li>Documentum [from EMC]</li>
<li>SharePoint [from Microsoft]</li>
<li>Meridio [from Meridio]</li>
<li>Memex [from Memex]</li>
<li>FileNet [from IBM]</li>
</ul>
</blockquote>
<p>There are two pieces in particular to highlight in the quote.  First of all, it&#8217;s an extensible framework, meaning new connectors can be added without the need for application developers writing &#8220;one-off&#8221; code just for that connector.  For anyone who&#8217;s lived that pain, you know first hand what I mean.  In fact, I&#8217;ve already heard from others who are thinking of contributing their connectors for other data stores as well!  Second, the framework accounts for repository specific security.  In corporate environments, this is vital to making sure that the right people, and only the right people, have access to the right information at the right time.</p>
<h2>Why is this important?</h2>
<p>Many, many search engines, not too mention many other applications, have either rolled their own connectors or bought a company that provides them.  Connectors, in some situations, are the cost of entry into  certain markets, but are rarely the feature that seals the deal.  By making these open source, we can all share the cost of maintaining it while increasing the quality of a piece of software well beyond what any one company can achieve.  Beyond that, we hope the repository companies will also step up and contribute (some are already quite open), as making it easier to access these repositories will no doubt lead to more applications, which of course should mean more sales for said companies.</p>
<h2>How can you contribute?</h2>
<p>For starters, subscribe to the <a href="http://incubator.apache.org/connectors/mail.html">mailing lists</a>.  Then check out the <a href="http://cwiki.apache.org/confluence/display/CONNECTORS/HowToContribute">How To Contribute page</a> on the Wiki.  Beyond that, chip in with your connector use cases on the mailing lists and be a part of the community.</p>
<h2>What&#8217;s next?</h2>
<p>First off, the community will have to process the software grant from MetaCarta and then commit the code to LCF&#8217;s Subversion <a href="https://svn.apache.org/repos/asf/incubator/lcf">repository</a>.  From there, we&#8217;ll do just like any Apache project does and look to build out not only the code, but also the community, all on the path to graduating from the Incubator and taking our place as a full-fledged Lucene subproject.  Keep your eyes here and on the mailing lists and websites for more information in the future!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/01/20/apache-lucene-connector-framework-now-in-incubation-at-the-asf/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

