Glossary
A
- Apache Software Foundation
-
The non-profit open source software foundation where Lucene is hosted and maintained. See http://www.apache.org.
D
- Document
-
The Lucene/Solr abstract representation of one or more units of content. Typically represents a file or a database record, but is ultimately user-defined. A Document consists of one or more Fields. A Document may be boosted in order to indicate it's importance over other Documents. See also Field.
F
- Field
-
The Lucene/Solr abstract representation of a single unit of content and metadata describing that content. Typically represents some piece of text or string in the original file or a column in a database, but is ultimately user-defined. A Field may be boosted in order to indicate its importance over other Fields. The metadata tells Lucene how to treat the content when it is added to the system.
I
- Information Need
-
See http://en.wikipedia.org/wiki/Information_need. A user's query is a representation (albeit truncated) of the user's information need.
- Information Retrieval
-
The interdisciplinary study of searching for information in documents, databases and other repositories of data. IR calls upon the fields of computer science, mathematics, linguistics, psychology, library science and a host of other fields.
L
P
- Precision
-
A statistical measure of the number of relevant documents returned by the system divided by the total number of documents returned in the result set. See Recall.
R
- Recall
-
A statistical measure of the number of relevant documents returned by the system divided by the total number of relevant documents that exist in the collection. See Precision.
- Relevance
-
From http://en.wikipedia.org/wiki/Relevance_(information_retrieval):
In the context of information science and information retrieval, relevance denotes how well a retrieved set of documents (or a single document) meets the information need of the user.
S
V
- Vector Space Model
-
From http://en.wikipedia.org/wiki/Vector_Space_Model
An algebraic model for representing text documents (and any objects, in general) as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings. Its first use was in the SMART Information Retrieval System.
