Apache Lucene
Apache Lucene is a Java-based search library available for free as open source under the liberal Apache Software License. This license allows users to modify or embed the technology as they see fit, and to keep proprietary, sell and/or re-distribute any resulting product. Java Lucene forms the search engine libraries at the heart of Apache Solr, the Lucene search server.
As a low-level library, Apache Lucene offers tremendous flexibility and has been historically embedded deep into many powerful applications. If you want to have all of your resources controlled exclusively by Java API calls that you write, Lucene may be a fit. Lucene allows experienced programmers to assemble and compile inside a native Java application, and to directly control the large set of sophisticated features with low-level access, data, or state manipulation, such as for byte-level manipulation of segments or intervention in data I/O. Investment at the lower level enables development of extremely sophisticated, cutting edge text search and retrieval capabilities.
For most search application development, Apache Solr is the logical starting point, as it encapsulates all of Lucene's search functions -- you might think of Solr as the 'serverization' of Lucene. Java programmers working directly with Lucene have often reported that they find Solr to contain “the same features I was going to build myself as a framework for Lucene, but already very-well implemented.” Once you start with Solr, and you find yourself using a lot of the features Solr provides out of the box, you will likely be better off using Solr’s well-organized extension mechanisms instead of starting from scratch using Apache Lucene.
If you're an experienced open source application developer using Solr, we recommend you download the LucidWorks Certified Distribution for Solr; if you plan to deploy Solr into a business-critical environment and/or production deployment, we recommend you work with LucidWorks Enterprise.
Lucene is written entirely in Java, though there are today .NET and other versions available. Lucene has a large number of active contributors and thousands of installations, including production applications at AOL, Apple, CNET, Comcast Interactive Media, IBM, LinkedIn, Monster, MySpace, Netflix, Technorati and Wikipedia. Lucene is full-featured and provides:
- Speed — sub-second query performance for most queries
- Strong out of the box relevancy ranking — as good or better than the best commercial competitors
- Complete query capabilities: keyword, Boolean and +/- queries,proximity operators, wildcards, fielded searching, term/field/document weights, find-similar, spell-checking, multi-lingual search and more
- Full results processing, including sorting by relevancy, date or any field, dynamic summaries and hit highlighting
- Portability: runs on any platform supporting Java, and indexes are portable across platforms - you can build an index on Linux and copy it to a Microsoft Windows machine and search it there
- Scalability — there are production applications in the hundreds of millions and billions of documents/records
- Low overhead indexes and rapid incremental indexing
The open source community is constantly updating, patching, and refining the Lucene source code. For best results, we recommend you download and use the LucidWorks Certified Distribution for Lucene, a well-packaged, integrated, supported release, with the latest up-to-date features tested and validated by the search experts at Lucid Imagination.
