Glossary

Glossary

A

Apache Software Foundation

The non-profit open source software foundation where Lucene is hosted and maintained. See http://www.apache.org.

D

Document

The Lucene/Solr abstract representation of one or more units of content. Typically represents a file or a database record, but is ultimately user-defined. A Document consists of one or more Fields. A Document may be boosted in order to indicate it's importance over other Documents. See alsoField.

F

Field

The Lucene/Solr abstract representation of a single unit of content and metadata describing that content. Typically represents some piece of text or string in the original file or a column in a database, but is ultimately user-defined. A Field may be boosted in order to indicate its importance over other Fields. The metadata tells Lucene how to treat the content when it is added to the system.

I

Information Need

See http://en.wikipedia.org/wiki/Information_need. A user's query is a representation (albeit truncated) of the user's information need.

Information Retrieval

The interdisciplinary study of searching for information in documents, databases and other repositories of data. IR calls upon the fields of computer science, mathematics, linguistics, psychology, library science and a host of other fields.

L

Lucene

A Java-based search library originally written by Doug Cutting. Lucene can be used to provide search and indexing capabilities to applications ranging from embedded systems through large scale Internet search.

P

Precision

A statistical measure of the number of relevant documents returned by the system divided by the total number of documents returned in the result set. See Recall.

R

Recall

A statistical measure of the number of relevant documents returned by the system divided by the total number of relevant documents that exist in the collection. See Precision.

Relevance

From http://en.wikipedia.org/wiki/Relevance_(information_retrieval):

In the context of information science and information retrieval, relevance denotes how well a retrieved set of documents (or a single document) meets the information need of the user.

 

S

Solr

Apache Solr is a Java-based search server built on Lucene with many enterprise-ready features like caching, replication, multiple language bindings, faceting, and a REST-like protocol.

Stopword

A commonly occurring word, such as "the", "an" and "a", that often adds little value to a search.

V

Vector Space Model

From http://en.wikipedia.org/wiki/Vector_Space_Model

An algebraic model for representing text documents (and any objects, in general) as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System.