• Products
    • Overview
    • LucidWorks Search Platform
      • Features and Benefits
      • Technical Overview
      • Only with LucidWorks
      • LucidWorks and Solr
      • White Papers
      • LucidWorks Enterprise
      • LucidWorks Cloud
    • Certified Distributions
      • Certified Solr
      • Certified Lucene
    • Apache Releases
      • Apache Solr
      • Apache Lucene
  • Support & Services
    • Overview
    • Support
    • Training
    • Solr/Lucene Certification
    • ExpertLink Advisory
    • Consulting
    • Partners
    • Subscriptions
  • Why Lucid?
    • Why Lucid?
    • Technology
    • Technical Leadership
    • Who uses Lucene/Solr?
      • What customers are saying
    • Case Studies
    • Whitepapers
    • Demos
    • Webinars
  • Blog
  • DevZone
    • DevZone Overview
    • Forums (LWE)
    • Videos & Podcasts
      • How To's
      • Screencasts
      • Podcasts
      • Conference Videos
    • Technical Articles
      • Whitepapers
    • Reference Materials
      • Documentation
      • Solr Reference Guide
      • Solr & LucidWorks Matrix
      • Tutorials
    • Events
      • Conferences
      • Meet Ups
    • Code & Test
  • Downloads
  • About Us
    • Management
    • Careers
    • News
      • Media Coverage
      • Press Releases
    • Contact Us
Sign Up or Log In
Home . Blog

January 20, 2010

Apache Lucene Connector Framework now in Incubation at the ASF

Posted by Grant Ingersoll

Short Version

The Apache Lucene Connector Framework project has officially entered incubation.  LCF, for short, is going to be a framework for connecting to content repositories like Sharepoint, Documentum, etc. and will make it easy to hook into Lucene, Solr, Nutch, Mahout, Tika, while, of course, remaining agnostic of the final destination of the data.  See the Connectors website and the original proposal for more info.  Help wanted!

Long Version

Background

A while back, MetaCarta, a spatial search company, approached us about open sourcing their internally developed Connector Framework at the Apache Software Foundation.  After several discussions and a whole bunch of legwork getting a proposal together, the LCF is now officially launched in the Apache Incubator!  We’ve already got a great roster of committers lined up and are working to incorporate the software grant from MetaCarta, from which we can build out a first release, so stay tuned!  Lucid Imagination, of course, is a big supporter of this project and we look forward to it’s success!

What is a Connector Framework?

To quote the proposal:

[The Lucene] Connector Framework is an extendible [sic] incremental crawler, which uses a database to manage configuration and crawl history, and provides reasonably high performance in accessing content in multiple repositories for the main purpose of search engine indexing. Connector Framework also establishes a repository-specific security model which can be used to limit search user access to repository content based on a user’s identity. Connector Framework also includes existing connectors and authorities for:

  • File system
  • Windows shares
  • JDBC-supported databases
  • RSS feeds
  • General websites
  • LiveLink [from OpenText]
  • Documentum [from EMC]
  • SharePoint [from Microsoft]
  • Meridio [from Meridio]
  • Memex [from Memex]
  • FileNet [from IBM]

There are two pieces in particular to highlight in the quote.  First of all, it’s an extensible framework, meaning new connectors can be added without the need for application developers writing “one-off” code just for that connector.  For anyone who’s lived that pain, you know first hand what I mean.  In fact, I’ve already heard from others who are thinking of contributing their connectors for other data stores as well!  Second, the framework accounts for repository specific security.  In corporate environments, this is vital to making sure that the right people, and only the right people, have access to the right information at the right time.

Why is this important?

Many, many search engines, not too mention many other applications, have either rolled their own connectors or bought a company that provides them.  Connectors, in some situations, are the cost of entry into  certain markets, but are rarely the feature that seals the deal.  By making these open source, we can all share the cost of maintaining it while increasing the quality of a piece of software well beyond what any one company can achieve.  Beyond that, we hope the repository companies will also step up and contribute (some are already quite open), as making it easier to access these repositories will no doubt lead to more applications, which of course should mean more sales for said companies.

How can you contribute?

For starters, subscribe to the mailing lists.  Then check out the How To Contribute page on the Wiki.  Beyond that, chip in with your connector use cases on the mailing lists and be a part of the community.

What’s next?

First off, the community will have to process the software grant from MetaCarta and then commit the code to LCF’s Subversion repository.  From there, we’ll do just like any Apache project does and look to build out not only the code, but also the community, all on the path to graduating from the Incubator and taking our place as a full-fledged Lucene subproject.  Keep your eyes here and on the mailing lists and websites for more information in the future!

  • Share this:
  • Email
  • Facebook
  • Digg
  • Share
  • Print
  • Reddit
  • StumbleUpon

Category: apache, Lucene, Lucene Connector Framework, Mahout, nutch, PyLucene, Solr, Tika

3 Responses to “Apache Lucene Connector Framework now in Incubation at the ASF”

  1. [...] I just put up some initial info on the new Apache Lucene Connector Framework project that is now in ASF Incubation.  See Lucid Imagination » Apache Lucene Connector Framework now in Incubation at the ASF. [...]

    January 20, 2010 12:37 — Just posted on: Apache Lucene Connector Framework now in Incubation at the ASF

  2. [...] 22, 2010 Apache Lucene Connector today entered “incubation” at Apache (press release here). This is great news for those working with content repositories such as Microsoft Sharepoint, EMC [...]

    January 22, 2010 05:48 — Connecting to Content Repositories with Apache Lucene Connector « Priocept Blog

  3. [...] Lucene Connector today entered “incubation” at Apache (press release here). This is great news for those working with content repositories such as Microsoft Sharepoint, EMC [...]

    May 12, 2011 01:02 — Priocept » Connecting to Content Repositories with Apache Lucene Connector

Leave a Reply

Go to Blog Front Page

  • Recent Posts

    • Lucene Revolution 2012 – Call for Participation now open!
    • SolrCloud is Coming (and looking to mix in even more ‘NoSQL’)
    • Our Solr Reference Guide updated for v3.5
    • Enhancing Discovery with Solr and Mahout – session slides now available!
    • Solr and LucidWorks feature matrix available
    • LucidWorks Enterprise latest version 2.0.1 released!
    • Why Not AND, OR, And NOT?
    • Options to tune document’s relevance in Solr
    • Dallas JavaMUG December 14th 2011
    • Apache Mahout user meeting – session slides and videos are now available!
  • Archives

    • January 2012
    • December 2011
    • November 2011
    • October 2011
    • September 2011
    • August 2011
  • Tags

    acts_as_solr apache Apache Mahout best practices chump code4lib dismax drupal enterprise search Erik Hatcher field collapsing function query Grant Ingersoll hoss image isfdb local params Lucene lucene revolution LucidGaze lucid imagination Mahout Marc Krellenstein Mark Miller nested queries nutch Open Source Open Source Search qparser query parser queryparser Rails release result grouping Richmond Ruby schema design sint Solr solr 3.1 solr 4.0 solr cloud sortable Tika VA
  • Contact Us
  • About Lucid Imagination
  • Help & Support
  • Training
  • Privacy Policy
  • Legal Terms of Use
  • Copyrights and Disclaimers
  • Log in

Apache Solr, Solr, Apache Lucene, Lucene and their logos are trademarks of the Apache Software Foundation.

© 2011 Lucid Imagination. All Right reserved.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.