Books and Publications
![]() |
Lucene in Action, Second Editionby Erik Hatcher, Otis Gospodnetic, and Michael McCandless Lucene in Action, Second Edition, completely revises and updates the best-selling first edition and remains the authoritative book on Lucene. This book shows you how to index your documents, including types such as MS Word, PDF, HTML, and XML. It introduces you to searching, sorting, and filtering, and covers the numerous changes to Lucene since the first edition. All source code has been updated to current Lucene 2.3 APIs. |
![]() |
Lucene in Actionby Erik Hatcher and Otis Gospodnetic Lucene is a gem in the open-source world--a highly scalable, fast search engine. It delivers performance and is disarmingly easy to use. Lucene in Action is the authoritative guide to Lucene. It describes how to index your data, including types you definitely need to know such as MS Word, PDF, HTML, and XML. It introduces you to searching, sorting, filtering, and highlighting search results. |
Solr 1.4 Enterprise Search Server URLby David Smiley and Eric Pugh This book first gives you a quick overview of Solr, and then gradually takes you from basic to advanced features that enhance your search. It starts off by discussing Solr and helping you understand how it fits into your architecture—where all databases and document/web crawlers fall short, and Solr shines. The main part of the book is a thorough exploration of nearly every feature that Solr offers. To keep this interesting and realistic, we use a large open source set of metadata about artists, releases, and tracks courtesy of the MusicBrainz.org project. Using this data as a testing ground for Solr, you will learn how to import this data in various ways from CSV to XML to database access. |
![]() |
Taming Text
|
![]() |
Mahout in Actionby Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman Mahout in Action, explores machine learning through Apache's scalable machine learning project, Mahout. Following real-world examples, it introduces practical use cases, and then illustrates how Mahout can be applied to solve them. It places particular focus on issues of scalability, and how to apply these techniques against large data sets using the Apache Hadoop framework. |
![]() |
Machine Learning in Action, Early Access Editionby Peter Harrington Machine Learning in Action, Early Access Edition, is a unique book that blends the foundational theories of machine learning with the practical realities of building tools for everyday data analysis. In it, you'll use the flexible Python programming language to build programs that implement algorithms for data classification, forecasting, recommendations, and higher-level features like summarization and simplification. |
![]() |
ManifoldCF in Actionby Karl Wright No matter how exciting a search engine might be, it's worthless unless it has data to index. ManifoldCF is an open source framework for pulling content out of a repository and sending it on to targets such as Solr via a plug-in style, connector-based architecture. ManifoldCF includes connectors for numerous commercial and open source data sources, including Documentum, SharePoint, JDBC, and RSS. ManifoldCF in Action is a comprehensive tutorial and reference that shows you how to integrate search with enterprise-level document repositories using ManifoldCF. The book begins with an architectural overview of ManifoldCF and how it fits into your application infrastructure. |
![]() |
Java Development with Antby Erik Hatcher and Steve Loughran "Overall, Java Development with Ant is an excellent resource...rich in valuable information that is well organized and clearly presented. ...written by Erik Hatcher and Steve Loughran who are both committers to the Apache Ant project, is a great resource for anyone wishing to learn how to integrate Ant into his personal set of best practices for software configuration management solutions." -- Slashdot.org |







