
Technical Articles
Setting up Apache Solr in Eclipse
by Amit Nithianandan
Apache's Solr is a powerful software package that allows you to develop your own search engine in no time. It's purely written in Java using Lucene at its core and can run inside any servlet container such as Tomcat (or Jetty). Eclipse is an IDE that makes developing Java applications incredibly easy because of its wealth of features such as code completion and refactoring capabilities not to mention the number of free plugins available to further make development easier.
Solr and RDBMS: The basics of designing your application for the best of both
by Amit Nithianandan
The Relational Database (RDBMS) is the cornerstone of data persistence in software development. While modern data workloads have the RDBMS under fire recently due to some of its scalability and speed constraints, its longevity, portability, abundance of well-written GUI management tools, and ease of querying still makes it the popular application data storage mechanism of choice. And tabular data representation–rows of records and columns of fields – are an intuitive way to organize many transactional data types.
As a result, it's natural to think about the relational model when trying to organize the data for your search application. At the same time, there are some real advantages to using an inverse-index based system such as Solr/Lucene to design the search service for your application, so users can quickly wade through mountains of data. So what does the RDBMS do best, and what should you rely on Solr for?
Lucene or Solr: Choosing the right search development platform
by Lucid Imagination
The great improvements in the capabilities of Lucene and Solr open source search technology have created rapidly growing interest in using them as alternatives to other search applications. As is often the case with open-source technology, online community documentation provides rich details on features and variations, but does little to provide explicit direction on which technologies would be the best choice. So when is Lucene preferable to Solr and vice versa?
Fanfeedr.com: User-driven Search Relevance and Content Aggregation
Fanfeedr.com is a real-time, personalized sports aggregation website with a social networking layer on top. It now aggregates more than 3,500 sources providing information on more than 55,000 athletes and over 4,000 teams, including those from over 1,700 colleges and universities across 15 different sports. By aggregating data in a database but using Solr to index the documents and the relationships between them, Fanfeedr can both deliver highly relevant content and keep pace with the rapid growth and variety of incoming content.
Using Solr Search with RDBMS
by Altan Khendup
In many shops some of the most common queries used in large scale RDBMS systems such as Oracle are for pattern searches within ranges of criteria, typically targeted searches for data by users to answer and meet certain business needs. Writing standardized reports or simple relational queries can answer the questions, but such mechanisms can be inflexible and costly to maintain. One more efficient way to address these challenges is through the power of Solr.
Searching rich format documents stored in a DBMS
by Jonck van der Kogel
As companies gather more and more data, the ability to search this data is becoming increasingly important. Especially with legacy systems, this can sometimes be quite a challenge. One situation you might encounter is where documents in rich formats such as PDF, MS Word/Excel/Powerpoint, etc are stored as BLOBs in a SQL database. Your first reaction might be that this would be a lot of work, since Solr does not support such an import natively. But by using the DataImportHandler of Solr and a custom Transformer, it actually becomes pretty easy and straightforward.
Technical Application Note: ilocal and JTeam build Context Aware Local Search with Solr
by ilocal and JTeam
As leading on-line directory service site headquartered in the Netherlands, ilocal had stringent requirements for innovative functionality implemented quickly. Their legacy search technology was optimized for Web-centric searches, rather than accommodating the additional complexities of wide-ranging enterprise datasets. Working with JTeam,a Dutch-based company specializing in open source, they architected and jointly developed a new solution featuring: results ranking that combined complete flexibility with exquisite precision; scalability with low latency for users; and support for location-based searches and geo-tagged data.
Technical Application Note: How Zvents Local Search uses Solr to find you things to do
by Amit Nithian, Ivan Small, and Tony Barreca
Zvents aggregates and distributes local content about “things to do” -- so users can discover events, entertainment, restaurants etc., based on time and place. With two-level ranking and cache optimizations that improve both utilization and user experience, Zvents has achieved key innovations with Solr that deliver both optimal performance and more relevant results.
Crawling in Open Source, Part 1
by Sami Siren
This is the first of a two part series of articles that will
focus on Open Source web crawlers implemented in Java programming
language. The goal is familiarize user in some basic concepts of
crawling and also dig deeper into some implementations such as Apache
Nutch and Apache Droids. This first part covers the generic part as
well as Apache Nutch.
Marc Krellenstein Featured on 'Wizards of Search' on arnoldit.com
by Marc Krellenstein
Open source software continues to gain momentum. In part, savvy managers recognize that commercial software can lock an organization into a walled garden. When captured, some flexibility is lost. With mounting economic pressures, open source software can reduce or hold down certain costs such as annual licensing fees, mandatory certification programs, or access to some third party software because that software is not compatible with a proprietary solution. In search and content processing, open source search systems such as Lucene have become worthy challengers to some commercial and proprietary systems. IBM, for example, uses Lucene in its search products.
Search Engine versus DBMS
by Marc Krellenstein
Many users of databases often wonder what a full text search engine can do that a database cannot do. After all, most databases offer some semblance of text-based search, even if it often seems like an afterthought. At the same time, most search engines offer things like storage and some set manipulation logic. How's a user to decide what to do? In this article, Marc Krellenstein, explores the benefits of a full text search engine in comparison to a database.
Scaling Lucene and Solr
by Mark Miller
While many Lucene/Solr applications will never outgrow a single, well-configured machine, the fact is, more and more applications are pushing beyond the single machine limit due to either index size or query volume. In discussing Lucene and Solr best practices for performance and scaling, Mark Miller explains how to get the most out of a single machine, as well as how to harness multiple machines to handle large indexes, large query volume, or both.
Debugging Relevance Issues in Search
by Grant Ingersoll
Many people focus purely on the speed of search, often neglecting the quality of the results produced by the system. In most cases, people test out some small set of queries, eyeball the top five or ten and then declare the system good enough. In other cases, they have a suite of test queries to run, but they are at a loss for how to fix any issues that arise.
To solve this relevance problems takes a systematic approach, a set of useful tools and a dose of patience. This article will outline several approaches and tools. The patience part will come from knowing the problem is being looked at in a pragmatic way that will lead to a solution instead of a dead end.
