Technical Articles
Lucid Imagination's technical staff and partners have published several technical articles on using Solr, Lucene and LucidWorks Enterprise. The full list of the articles is below.
Lucid has written several longer documents as Whitepapers. Each whitepaper is free to download with registration.
Scaling Mahout
Apache Mahout's goal is to build scalable machine learning libraries under the Apache license. Leveraging Apache Hadoop where practical, Mahout implements a number of machine learning algorithms for classification, clustering and collaborative filtering. In the companion article, first published on IBM's developerWorks website, Lucid Imagination's Chief Scientist, Grant Ingersoll, walks readers through using Mahout locally and on Amazon EC2. The code and data sets used in the article are linked below.
Solr and LucidWorks: How to Choose the Right Platform
If the LucidWorks Platform is built on Solr, how do you know which one to use when for your own circumstances? This article describes the difference between using straight Solr, using the LucidWorks user interface, and using the LucidWorks ReST API for accomplishing various common tasks so you can see which fits your situation at a given moment.
In this article, we'll look at how to make the right choice of tool, and understand the tradeoffs you might make in working with one over the other. We'll review some of the more common tasks you'll need to accomplish when building search into your applications, and what they mean in each of these three environments, and compare working with straight Apache Solr, with LucidWorks web-based UI, and the LucidWorks REST API.
Search is Where Social Meets Enterprise
by David M. Fishman
With or without the influence of social media, enterprise data is taking on some of the same characteristics as social media. Data structure is mixed between structured, semi-structured and unstructured; it arrives at ever-accelerating rates, in near real time; demand is growing for access in real time; and it continues to span multiple, incompatible repositories. A common example: sales or service reps correspond with customers and each other by email, intermixed with filling in fields on a fixed form in CRM apps.
Setting up Apache Solr in Eclipse
This article refers to Solr 1.4. For the most current information, please visit http://wiki.apache.org/solr/HowToConfigureEclipse
Apache's Solr is a powerful software package that allows you to develop your own search engine in no time. It's purely written in Java using Lucene at its core and can run inside any servlet container such as Tomcat (or Jetty). Eclipse is an IDE that makes developing Java applications incredibly easy because of its wealth of features such as code completion and refactoring capabilities not to mention the number of free plugins available to further make development easier.
Solr and RDBMS: The basics of designing your application for the best of both
The Relational Database (RDBMS) is the cornerstone of data persistence in software development. While modern data workloads have the RDBMS under fire recently due to some of its scalability and speed constraints, its longevity, portability, abundance of well-written GUI management tools, and ease of querying still makes it the popular application data storage mechanism of choice. And tabular data representation–rows of records and columns of fields – are an intuitive way to organize many transactional data types.
As a result, it's natural to think about the relational model when trying to organize the data for your search application. At the same time, there are some real advantages to using an inverse-index based system such as Solr/Lucene to design the search service for your application, so users can quickly wade through mountains of data. So what does the RDBMS do best, and what should you rely on Solr for?
Lucene or Solr: Choosing the right search development platform
The great improvements in the capabilities of Lucene and Solr open source search technology have created rapidly growing interest in using them as alternatives to other search applications. As is often the case with open-source technology, online community documentation provides rich details on features and variations, but does little to provide explicit direction on which technologies would be the best choice. So when is Lucene preferable to Solr and vice versa?
Fanfeedr.com: User-driven Search Relevance and Content Aggregation
Fanfeedr.com is a real-time, personalized sports aggregation website with a social networking layer on top. It now aggregates more than 3,500 sources providing information on more than 55,000 athletes and over 4,000 teams, including those from over 1,700 colleges and universities across 15 different sports. By aggregating data in a database but using Solr to index the documents and the relationships between them, Fanfeedr can both deliver highly relevant content and keep pace with the rapid growth and variety of incoming content.
Solr Searching with RDBMS
In many shops some of the most common queries used in large scale RDBMS systems such as Oracle are for pattern searches within ranges of criteria, typically targeted searches for data by users to answer and meet certain business needs. Writing standardized reports or simple relational queries can answer the questions, but such mechanisms can be inflexible and costly to maintain. One more efficient way to address these challenges is through the power of Solr.
Searching rich format documents stored in a DBMS
As companies gather more and more data, the ability to search this data is becoming increasingly important. Especially with legacy systems, this can sometimes be quite a challenge. One situation you might encounter is where documents in rich formats such as PDF, MS Word/Excel/Powerpoint, etc are stored as BLOBs in a SQL database. Your first reaction might be that this would be a lot of work, since Solr does not support such an import natively. But by using the DataImportHandler of Solr and a custom Transformer, it actually becomes pretty easy and straightforward.
Technical Application Note: ilocal and JTeam build Context Aware Local Search with Solr
As leading on-line directory service site headquartered in the Netherlands, ilocal had stringent requirements for innovative functionality implemented quickly. Their legacy search technology was optimized for Web-centric searches, rather than accommodating the additional complexities of wide-ranging enterprise datasets. Working with JTeam,a Dutch-based company specializing in open source, they architected and jointly developed a new solution featuring: results ranking that combined complete flexibility with exquisite precision; scalability with low latency for users; and support for location-based searches and geo-tagged data.
