<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; Payloads</title>
	<atom:link href="http://www.lucidimagination.com/blog/category/payloads/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Apache Solr 1.5 on the move with more &#8220;functionality&#8221;</title>
		<link>http://www.lucidimagination.com/blog/2009/12/12/apache-solr-1-5-on-the-move-with-more-functionality/</link>
		<comments>http://www.lucidimagination.com/blog/2009/12/12/apache-solr-1-5-on-the-move-with-more-functionality/#comments</comments>
		<pubDate>Sat, 12 Dec 2009 23:45:26 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[functions]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Payloads]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[spatial search]]></category>
		<category><![CDATA[ZooKeeper]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=1404</guid>
		<description><![CDATA[<p>The paint is barely dry on <a href="http://lucene.apache.org/solr">Apache Solr</a> 1.4 and the community is already on the move for Solr 1.5 (which may actually be Solr 2.0, but for now let&#8217;s call it 1.5).</p>
<p>I&#8217;m particularly excited about a few things:</p>
<ol>
<li>Massive scalability capabilities via distributed search, indexing and shard management &#8211; Up until now, Solr scales pretty well on the search side (I&#8217;ve seen billion+ document instances and we&#8217;ve benchmarked it at that level too), </li>&#8230;</ol>]]></description>
			<content:encoded><![CDATA[<p>The paint is barely dry on <a href="http://lucene.apache.org/solr">Apache Solr</a> 1.4 and the community is already on the move for Solr 1.5 (which may actually be Solr 2.0, but for now let&#8217;s call it 1.5).</p>
<p>I&#8217;m particularly excited about a few things:</p>
<ol>
<li>Massive scalability capabilities via distributed search, indexing and shard management &#8211; Up until now, Solr scales pretty well on the search side (I&#8217;ve seen billion+ document instances and we&#8217;ve benchmarked it at that level too), but the work underway in Solr 1.5 will take it to a whole new level, thanks to the integration of Apache <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a> and other distributed technologies.  For those interested, check out the &#8220;cloud&#8221; branch in <a href="https://svn.apache.org/repos/asf/lucene/solr/branches/cloud/">SVN</a>.</li>
<li>Functions, functions, functions!  We&#8217;ve already added a bunch of <a href="http://wiki.apache.org/solr/FunctionQuery">functions</a> (see my <a href="http://www.lucidimagination.com/blog/2009/11/20/fun-with-solr-functions/">earlier post</a>) and I see more on the horizon.  Additionally, I see great value in adding, for lack of a better phrase, aggregating functions to the mix (via <a href="https://issues.apache.org/jira/browse/SOLR-1622">SOLR-1622</a>).  This will allow application designers to do much more sophisticated math across a search result set than what is currently available via the StatsComponent.  In some ways, this can empower business intelligence applications on top of Solr (I realize it is just a small piece of the BI pie) as well as more sophisticated mathematical applications.</li>
<li>Spatial Search!  It&#8217;s funny, a lot of people want spatial search and Solr could have simply harnessed a really nice existing package (LocalSolr) just as many already do, but by stepping back and taking a look at spatial in the context of the bigger picture of things (see <a href="https://issues.apache.org/jira/browse/SOLR-773">SOLR-773</a>) that would be nice to have in Solr, the community will be able to not only implement spatial search (by leveraging key pieces of LocalSolr where appropriate), but will also get a whole bevy of other features, including:
<ol>
<li>Sort By Function &#8211; Instead of a one off that sorts solely by distance, why not enable Solr users to sort by any arbitrary function?  I just committed this tonight via <a href="https://issues.apache.org/jira/browse/SOLR-1297">SOLR-1297</a>.</li>
<li>&#8220;Poly&#8221; Field Types &#8211; Thanks to <a href="https://issues.apache.org/jira/browse/SOLR-1131">SOLR-1131</a>, Solr&#8217;s FieldType mechanism can be used to represent multiple underlying fields.  This is especially useful for representing things like points in an <em>n</em>-dimensional space, Cartesian Tiers (zoom levels) and other cool things.  Moreover, it shows the types of abstractions Solr can overlay on the already powerful Apache Lucene to provide even more functionality.</li>
<li>Facet By Function &#8211; Sure, it&#8217;s great to put your distances into buckets, but why not put the result of any function into buckets?  See <a href="https://issues.apache.org/jira/browse/SOLR-1581">SOLR-1581</a>.</li>
<li>Spatial Query Parsers &#8211; aka geocoding &#8211; Parse things like street addresses, etc. and get back appropriate Query instances. See <a href="https://issues.apache.org/jira/browse/SOLR-1568">SOLR-1568</a> and <a href="https://issues.apache.org/jira/browse/SOLR-1578">SOLR-1578.</a></li>
<li>Several different distance functions, including haversine (great circle), Manhattan, Euclidean (Solr actually now supports all p-norms as distance functions.)  See <a href="https://issues.apache.org/jira/browse/SOLR-1302">SOLR-1302</a>.  I even added in the ability to do String distance calculations using Levenstein (edit), Jaro-Winkler, n-gram (basically all of the Lucene spellchecker distance measures, as well as any user defined String Distance calculation.</li>
<li>&#8220;pseudo&#8221; fields &#8211; Instead of just hacking the ability to put a distance calculation into the result, why not allow the response to stream out &#8220;fields&#8221; based on things like functions or other user defined values?  See <a href="https://issues.apache.org/jira/browse/SOLR-1298">SOLR-1298</a>.</li>
</ol>
</li>
<li>Field Collapsing &#8211; I haven&#8217;t had time to work on it, but I suspect Field Collapsing will finally make it into 1.5.  Field Collapsing allows Solr to &#8220;roll-up&#8221; similar results much like you see on many Internet search sites that indent results from the same domain.</li>
<li>Payload and Span Query support &#8211; Solr&#8217;s been able to index payloads for some time now, but it still requires a user to hook in their own query parser support.  It would also be really great to see functions that can work on payloads, too. See <a href="https://issues.apache.org/jira/browse/SOLR-1337">SOLR-1337</a> and <a href="https://issues.apache.org/jira/browse/SOLR-1485">SOLR-1485</a>.</li>
</ol>
<p>Of course, as I always say, &#8220;in open source, you never know where the next good idea is going to come from&#8221;, so I have total faith that the Solr community will come up with a plethora of other great new features, as well as the usual bug fixes, etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/12/12/apache-solr-1-5-on-the-move-with-more-functionality/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Getting Started with Payloads</title>
		<link>http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/</link>
		<comments>http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/#comments</comments>
		<pubDate>Wed, 05 Aug 2009 14:20:52 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[BoostingTermQuery]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[Payloads]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=884</guid>
		<description><![CDATA[<p>Mark Miller recently <a href="http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/">posted</a> a brief intro to Span Queries, so I thought I would piggyback on top of his work and show how to get started with <a href="http://www.lucidimagination.com/search/s:lucid?q=payload">Payloads</a> (see also [1]).</p>
<h2>Introduction</h2>
<p>Like Spans, payloads involve the position of terms, but go one step further.  Namely, a Payload in Apache Lucene is an arbitrary byte array stored at a specific position (i.e. a specific token/term) in the index.  A payload can be used to &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Mark Miller recently <a href="http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/">posted</a> a brief intro to Span Queries, so I thought I would piggyback on top of his work and show how to get started with <a href="http://www.lucidimagination.com/search/s:lucid?q=payload">Payloads</a> (see also [1]).</p>
<h2>Introduction</h2>
<p>Like Spans, payloads involve the position of terms, but go one step further.  Namely, a Payload in Apache Lucene is an arbitrary byte array stored at a specific position (i.e. a specific token/term) in the index.  A payload can be used to store weights for specific terms or things like part of speech tags or other semantic information.  If you read Brin and Page&#8217;s (you know, the Google guys) original paper <a href="http://infolab.stanford.edu/~backrub/google.html">Anatomy of a Large-Scale Hypertextual Search Engine</a>,  they describe what is essentially a payload functionality, whereby they store information about font, etc. at a specific position in the index (remember when you could get your pages ranked number one by using really big fonts?) and then utilize it in search.</p>
<p>There are three parts to taking advantage of payloads in Lucene.  Solr requires a fourth step, which I will explain in a moment.</p>
<ol>
<li>Add a Payload to one or more Tokens during indexing.</li>
<li>Override the Similarity class to handle scoring payloads</li>
<li>Use a Payload aware Query during your search</li>
</ol>
<p>For Solr, step 3 requires you to have your own Query Parser, as none of the existing Solr Query Parsers support the BoostingTermQuery.  Thus, the third step for Solr is add a Query Parser that supports payloads (and Spans would be nice, too!  Please donate if you do this!)</p>
<h2>Adding Payloads during indexing</h2>
<p>(I&#8217;m using Lucene 2.9-dev)</p>
<p>I&#8217;m going to use the same indexing code I did for my post on <a href="http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/">co-occurrence analysis</a>, but with a few modifications.</p>
<p>First off, I&#8217;m going to change Analyzers to one of my own creation:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">class</span> PayloadAnalyzer <span style="color: #000000; font-weight: bold;">extends</span> Analyzer <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">private</span> PayloadEncoder encoder<span style="color: #339933;">;</span>
&nbsp;
    PayloadAnalyzer<span style="color: #009900;">&#40;</span>PayloadEncoder encoder<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">encoder</span> <span style="color: #339933;">=</span> encoder<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> TokenStream tokenStream<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> fieldName, <span style="color: #003399;">Reader</span> reader<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      TokenStream result <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> WhitespaceTokenizer<span style="color: #009900;">&#40;</span>reader<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      result <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> LowerCaseFilter<span style="color: #009900;">&#40;</span>result<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      result <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> DelimitedPayloadTokenFilter<span style="color: #009900;">&#40;</span>result, <span style="color: #0000ff;">'|'</span>, encoder<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #000000; font-weight: bold;">return</span> result<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span></pre></div></div>

<p>In this Analyzer, I have the basic whitespace tokenizer and a lower case filter, but then I add in the recently added DelimitedPayloadTokenFilter (DPTF).  The DPTF allows you to add payloads to tokens simply by marking up the tokens with a special character followed by the payload value.  For instance, I changed my sample docs from the co-occurrence example to now include payload information.  Specifically, I said that all nouns should be weighted by 10, all verbs by 5 and all adjectives by 2 (I used <a href="http://l2r.cs.uiuc.edu/~cogcomp/pos_demo.php">http://l2r.cs.uiuc.edu/~cogcomp/pos_demo.php</a> to tag the sentences, any errors are likely mine.)  Everything else has no payload.   I also stripped all punctuation. My DOCS array now looks like:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> DOCS <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
           <span style="color: #0000ff;">&quot;The quick|2.0 red|2.0 fox|10.0 jumped|5.0 over the lazy|2.0 brown|2.0 dogs|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;The quick red fox jumped over the lazy brown dogs&quot;</span>,<span style="color: #666666; font-style: italic;">//no boosts</span>
          <span style="color: #0000ff;">&quot;The quick|2.0 red|2.0 fox|10.0 jumped|5.0 over the old|2.0 box|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary|10.0 had a little|2.0 lamb|10.0 whose fleece|10.0 was|5.0 white|2.0 as snow|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary had a little lamb whose fleece was white as snow&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary|10.0 takes on Wolf|10.0 Restoration|10.0 project|10.0 despite ties|10.0 to sheep|10.0 farming|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary|10.0 who lives|5.0 on a farm|10.0 is|5.0 happy|2.0 that she|10.0 takes|5.0 a walk|10.0 every day|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;Moby|10.0 Dick|10.0 is|5.0 a story|10.0 of a whale|10.0 and a man|10.0 obsessed|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;The robber|10.0 wore|5.0 a black|2.0 fleece|10.0 jacket|10.0 and a baseball|10.0 cap|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;The English|10.0 Springer|10.0 Spaniel|10.0 is|5.0 the best|2.0 of all dogs|10.0&quot;</span>
  <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></div></div>

<p>The DOCS array simply marks each noun, verb and adjective with a | (pipe symbol) and then a float indicating the boost.  I also added some docs that have no boosts at all to demonstrate the differences at query time.  The DPTF will then use this to encode the payloads using the PayloadEncoder. A PayloadEncoder is an interface that tells the DPTF how to convert the payload to a byte array. Also note that Lucene&#8217;s contrib/analysis package contains several other TokenFilters for adding payloads to a Token and, of course, you can write your own as well.  Furthermore, the PayloadHelper class can help encode/decode payloads for common types.</p>
<h2>Overriding the Similarity Class</h2>
<p>The next step, which should happen before indexing, is to override the Similarity class to handle payloads.  While it is isn&#8217;t strictly required that this happens before indexing in <strong>THIS</strong> case, it is a good habit to do in case you have made other changes to the Similarity class that are required during indexing (such as overriding how norms are encoded.)</p>
<p>Overriding the Similarity is done on both the IndexWriter and the IndexSearcher.  See [3] below for the full code, including the calls to set the similarity.  My Similarity implementation simply converts the byte array to a float and returns the float, as in:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">class</span> PayloadSimilarity <span style="color: #000000; font-weight: bold;">extends</span> DefaultSimilarity <span style="color: #009900;">&#123;</span>
    @Override
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">float</span> scorePayload<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> fieldName, <span style="color: #000066; font-weight: bold;">byte</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> bytes, <span style="color: #000066; font-weight: bold;">int</span> offset, <span style="color: #000066; font-weight: bold;">int</span> length<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #000000; font-weight: bold;">return</span> PayloadHelper.<span style="color: #006633;">decodeFloat</span><span style="color: #009900;">&#40;</span>bytes, offset<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//we can ignore length here, because we know it is encoded as 4 bytes</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<h2>Executing the Query</h2>
<p>Currently, Lucene has one payload aware Query called the BoostingTermQuery (BTQ for short,  see [2] for another Payload aware query that may be in Lucene 2.9), which can be used just like any other query.  For instance:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">IndexSearcher searcher <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IndexSearcher<span style="color: #009900;">&#40;</span>dir, <span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
searcher.<span style="color: #006633;">setSimilarity</span><span style="color: #009900;">&#40;</span>payloadSimilarity<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
BoostingTermQuery btq <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> BoostingTermQuery<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Term<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;body&quot;</span>, <span style="color: #0000ff;">&quot;fox&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
TopDocs topDocs <span style="color: #339933;">=</span> searcher.<span style="color: #006633;">search</span><span style="color: #009900;">&#40;</span>btq, <span style="color: #cc66cc;">10</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> topDocs.<span style="color: #006633;">scoreDocs</span>.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
   ScoreDoc doc <span style="color: #339933;">=</span> topDocs.<span style="color: #006633;">scoreDocs</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
   <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Doc: &quot;</span> <span style="color: #339933;">+</span> doc.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
   <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Explain: &quot;</span> <span style="color: #339933;">+</span> searcher.<span style="color: #006633;">explain</span><span style="color: #009900;">&#40;</span>btq, doc.<span style="color: #006633;">doc</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>In this example, I create the BTQ and hand it to the searcher and then print out the results.  Easy peasy, yet so powerful.</p>
<p>Running this yields:</p>
<blockquote><p>&#8212;&#8212;&#8212;&#8211;<br />
Results for body:fox of type: org.apache.lucene.search.payloads.BoostingTermQuery<br />
Doc: doc=0 score=4.2344446<br />
Explain: 4.234444 = (MATCH) fieldWeight(body:fox in 0), product of:<br />
7.071068 = (MATCH) btq, product of:<br />
0.70710677 = tf(phraseFreq=0.5)<br />
10.0 = scorePayload(&#8230;)<br />
1.9162908 = idf(body: fox=3)<br />
0.3125 = fieldNorm(field=body, doc=0)</p>
<p>Doc: doc=2 score=4.2344446<br />
Explain: 4.234444 = (MATCH) fieldWeight(body:fox in 2), product of:<br />
7.071068 = (MATCH) btq, product of:<br />
0.70710677 = tf(phraseFreq=0.5)<br />
10.0 = scorePayload(&#8230;)<br />
1.9162908 = idf(body: fox=3)<br />
0.3125 = fieldNorm(field=body, doc=2)</p>
<p>Doc: doc=1 score=0.42344445<br />
Explain: 0.42344445 = (MATCH) fieldWeight(body:fox in 1), product of:<br />
0.70710677 = (MATCH) btq, product of:<br />
0.70710677 = tf(phraseFreq=0.5)<br />
1.0 = scorePayload(&#8230;)<br />
1.9162908 = idf(body: fox=3)<br />
0.3125 = fieldNorm(field=body, doc=1)</p></blockquote>
<p>Notice how both Doc 0 and Doc 2, which both contain the word &#8220;fox&#8221; in the body occur before doc 1 even though they all have the same term frequency and length.</p>
<p>Running the a simple TermQuery (ignoring payloads) with the exact same Term, on the other hand, yields:</p>
<blockquote><p>&#8212;&#8212;&#8212;&#8211;<br />
Results for body:fox of type: org.apache.lucene.search.TermQuery<br />
Doc: doc=0 score=0.59884083<br />
Explain: 0.59884083 = (MATCH) fieldWeight(body:fox in 0), product of:<br />
1.0 = tf(termFreq(body:fox)=1)<br />
1.9162908 = idf(docFreq=3, numDocs=10)<br />
0.3125 = fieldNorm(field=body, doc=0)</p>
<p>Doc: doc=1 score=0.59884083<br />
Explain: 0.59884083 = (MATCH) fieldWeight(body:fox in 1), product of:<br />
1.0 = tf(termFreq(body:fox)=1)<br />
1.9162908 = idf(docFreq=3, numDocs=10)<br />
0.3125 = fieldNorm(field=body, doc=1)</p>
<p>Doc: doc=2 score=0.59884083<br />
Explain: 0.59884083 = (MATCH) fieldWeight(body:fox in 2), product of:<br />
1.0 = tf(termFreq(body:fox)=1)<br />
1.9162908 = idf(docFreq=3, numDocs=10)<br />
0.3125 = fieldNorm(field=body, doc=2)</p></blockquote>
<p>As you can see, in the TermQuery case, all the docs are scored exactly the same.</p>
<h2>Next Steps</h2>
<p>As you can see from above, getting started with Payloads is pretty easy.  In reality, the only hard part is determining what exactly to put in your payload and then how it should factor into your score.  Lucene takes care of the rest.  Tools like UIMA, OpenNLP and other proprietary vendors can often be used to provide higher level lexical, syntactical and semantic information about tokens, thus giving you the power to create very expressive payloads and richer search applications.</p>
<p>[1] See Michael Busch&#8217;s talk at the last SF Meetup for more details on payloads: <a href="http://www.meetup.com/SFBay-Lucene-Solr-Meetup/files/">http://www.meetup.com/SFBay-Lucene-Solr-Meetup/files/</a></p>
<p>[2] <a href="https://issues.apache.org/jira/browse/LUCENE-1341">https://issues.apache.org/jira/browse/LUCENE-1341</a></p>
<p>[3] Full class:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.lucidimagination.noodles</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">junit.framework.TestCase</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.store.Directory</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.store.RAMDirectory</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.index.IndexWriter</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.index.Term</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.document.Document</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.document.Field</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.Analyzer</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.TokenStream</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.WhitespaceTokenizer</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.LowerCaseFilter</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.payloads.DelimitedPayloadTokenFilter</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.payloads.PayloadEncoder</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.payloads.FloatEncoder</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.payloads.PayloadHelper</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.DefaultSimilarity</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.IndexSearcher</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.TopDocs</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.ScoreDoc</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.Query</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.TermQuery</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.payloads.BoostingTermQuery</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.spans.SpanNearQuery</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.spans.SpanQuery</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.Reader</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 *
 *
 **/</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> PayloadTest <span style="color: #000000; font-weight: bold;">extends</span> TestCase <span style="color: #009900;">&#123;</span>
Directory dir<span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> DOCS <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
          <span style="color: #0000ff;">&quot;The quick|2.0 red|2.0 fox|10.0 jumped|5.0 over the lazy|2.0 brown|2.0 dogs|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;The quick red fox jumped over the lazy brown dogs&quot;</span>,<span style="color: #666666; font-style: italic;">//no boosts</span>
          <span style="color: #0000ff;">&quot;The quick|2.0 red|2.0 fox|10.0 jumped|5.0 over the old|2.0 brown|2.0 box|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary|10.0 had a little|2.0 lamb|10.0 whose fleece|10.0 was|5.0 white|2.0 as snow|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary had a little lamb whose fleece was white as snow&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary|10.0 takes on Wolf|10.0 Restoration|10.0 project|10.0 despite ties|10.0 to sheep|10.0 farming|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary|10.0 who lives|5.0 on a farm|10.0 is|5.0 happy|2.0 that she|10.0 takes|5.0 a walk|10.0 every day|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;Moby|10.0 Dick|10.0 is|5.0 a story|10.0 of a whale|10.0 and a man|10.0 obsessed|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;The robber|10.0 wore|5.0 a black|2.0 fleece|10.0 jacket|10.0 and a baseball|10.0 cap|10.0&quot;</span>,
          <span style="color: #0000ff;">&quot;The English|10.0 Springer|10.0 Spaniel|10.0 is|5.0 the best|2.0 of all dogs|10.0&quot;</span>
  <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
  <span style="color: #000000; font-weight: bold;">protected</span> PayloadSimilarity payloadSimilarity<span style="color: #339933;">;</span>
&nbsp;
  @Override
  <span style="color: #000000; font-weight: bold;">protected</span> <span style="color: #000066; font-weight: bold;">void</span> setUp<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> <span style="color: #009900;">&#123;</span>
    dir <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> RAMDirectory<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    PayloadEncoder encoder <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> FloatEncoder<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    IndexWriter writer <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IndexWriter<span style="color: #009900;">&#40;</span>dir, <span style="color: #000000; font-weight: bold;">new</span> PayloadAnalyzer<span style="color: #009900;">&#40;</span>encoder<span style="color: #009900;">&#41;</span>, <span style="color: #000066; font-weight: bold;">true</span>, IndexWriter.<span style="color: #006633;">MaxFieldLength</span>.<span style="color: #006633;">UNLIMITED</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    payloadSimilarity <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> PayloadSimilarity<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    writer.<span style="color: #006633;">setSimilarity</span><span style="color: #009900;">&#40;</span>payloadSimilarity<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> DOCS.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #003399;">Document</span> doc <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Document</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">Field</span> id <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Field</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;id&quot;</span>, <span style="color: #0000ff;">&quot;doc_&quot;</span> <span style="color: #339933;">+</span> i, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Store</span>.<span style="color: #006633;">YES</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Index</span>.<span style="color: #006633;">NOT_ANALYZED_NO_NORMS</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      doc.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>id<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">//Store both position and offset information</span>
      <span style="color: #003399;">Field</span> text <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Field</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;body&quot;</span>, DOCS<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Store</span>.<span style="color: #006633;">NO</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Index</span>.<span style="color: #006633;">ANALYZED</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      doc.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>text<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      writer.<span style="color: #006633;">addDocument</span><span style="color: #009900;">&#40;</span>doc<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    writer.<span style="color: #006633;">close</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> testPayloads<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> <span style="color: #009900;">&#123;</span>
    IndexSearcher searcher <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IndexSearcher<span style="color: #009900;">&#40;</span>dir, <span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    searcher.<span style="color: #006633;">setSimilarity</span><span style="color: #009900;">&#40;</span>payloadSimilarity<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//set the similarity.  Very important</span>
    BoostingTermQuery btq <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> BoostingTermQuery<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Term<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;body&quot;</span>, <span style="color: #0000ff;">&quot;fox&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    TopDocs topDocs <span style="color: #339933;">=</span> searcher.<span style="color: #006633;">search</span><span style="color: #009900;">&#40;</span>btq, <span style="color: #cc66cc;">10</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    printResults<span style="color: #009900;">&#40;</span>searcher, btq, topDocs<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
    TermQuery tq <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TermQuery<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Term<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;body&quot;</span>, <span style="color: #0000ff;">&quot;fox&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    topDocs <span style="color: #339933;">=</span> searcher.<span style="color: #006633;">search</span><span style="color: #009900;">&#40;</span>tq, <span style="color: #cc66cc;">10</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    printResults<span style="color: #009900;">&#40;</span>searcher, tq, topDocs<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000066; font-weight: bold;">void</span> printResults<span style="color: #009900;">&#40;</span>IndexSearcher searcher, Query query, TopDocs topDocs<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;-----------&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Results for &quot;</span> <span style="color: #339933;">+</span> query <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; of type: &quot;</span> <span style="color: #339933;">+</span> query.<span style="color: #006633;">getClass</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">getName</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&amp;</span>lt<span style="color: #339933;">;</span> topDocs.<span style="color: #006633;">scoreDocs</span>.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      ScoreDoc doc <span style="color: #339933;">=</span> topDocs.<span style="color: #006633;">scoreDocs</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Doc: &quot;</span> <span style="color: #339933;">+</span> doc.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Explain: &quot;</span> <span style="color: #339933;">+</span> searcher.<span style="color: #006633;">explain</span><span style="color: #009900;">&#40;</span>query, doc.<span style="color: #006633;">doc</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">class</span> PayloadSimilarity <span style="color: #000000; font-weight: bold;">extends</span> DefaultSimilarity <span style="color: #009900;">&#123;</span>
    @Override
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">float</span> scorePayload<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> fieldName, <span style="color: #000066; font-weight: bold;">byte</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> bytes, <span style="color: #000066; font-weight: bold;">int</span> offset, <span style="color: #000066; font-weight: bold;">int</span> length<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #000000; font-weight: bold;">return</span> PayloadHelper.<span style="color: #006633;">decodeFloat</span><span style="color: #009900;">&#40;</span>bytes, offset<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//we can ignore length here, because we know it is encoded as 4 bytes</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">class</span> PayloadAnalyzer <span style="color: #000000; font-weight: bold;">extends</span> Analyzer <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">private</span> PayloadEncoder encoder<span style="color: #339933;">;</span>
&nbsp;
    PayloadAnalyzer<span style="color: #009900;">&#40;</span>PayloadEncoder encoder<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">encoder</span> <span style="color: #339933;">=</span> encoder<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> TokenStream tokenStream<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> fieldName, <span style="color: #003399;">Reader</span> reader<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      TokenStream result <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> WhitespaceTokenizer<span style="color: #009900;">&#40;</span>reader<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      result <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> LowerCaseFilter<span style="color: #009900;">&#40;</span>result<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      result <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> DelimitedPayloadTokenFilter<span style="color: #009900;">&#40;</span>result, <span style="color: #0000ff;">'|'</span>, encoder<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #000000; font-weight: bold;">return</span> result<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>

