<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lucid Imagination &#187; Span Queries</title>
	<atom:link href="http://www.lucidimagination.com/blog/category/span-queries/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:12:03 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Open Source Escrow to the Rescue</title>
		<link>http://www.lucidimagination.com/blog/2010/08/19/open-source-escrow-to-the-rescue/</link>
		<comments>http://www.lucidimagination.com/blog/2010/08/19/open-source-escrow-to-the-rescue/#comments</comments>
		<pubDate>Thu, 19 Aug 2010 21:12:20 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[apache]]></category>
		<category><![CDATA[Enterprise Search]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Lucene]]></category>
		<category><![CDATA[LucidGaze]]></category>
		<category><![CDATA[nutch]]></category>
		<category><![CDATA[PyLucene]]></category>
		<category><![CDATA[Span Queries]]></category>
		<category><![CDATA[ZooKeeper]]></category>
		<category><![CDATA[enterprise search]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Solr]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=2341</guid>
		<description><![CDATA[<p>Do you remember this scenario from days of  yore?</p>
<ul>
<li>Company A buys a software  license from Company B, a startup.</li>
<li>Company A crosses its fingers  that Company B doesn’t go bankrupt and disappear, along with the source code for  Company A’s mission-critical software.</li>
<li>Company B goes  kaput.</li>
<li>Company A is left with some  machine-readable binary code that it is powerless to develop or use.</li>
</ul>
<p>Source code escrow has changed the outcome  of this sticky situation &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Do you remember this scenario from days of  yore?</p>
<ul>
<li>Company A buys a software  license from Company B, a startup.</li>
<li>Company A crosses its fingers  that Company B doesn’t go bankrupt and disappear, along with the source code for  Company A’s mission-critical software.</li>
<li>Company B goes  kaput.</li>
<li>Company A is left with some  machine-readable binary code that it is powerless to develop or use.</li>
</ul>
<p>Source code escrow has changed the outcome  of this sticky situation for the better, and here’s how: Countless  software companies go out of business every year, and either their code  disappears entirely or goes to another company that doesn’t do any development  or maintenance on it. The concept of escrow is one way in which open source  gives companies a chance to continue their contribution and innovation, because  the code they wrote can outlive them and continue to be evolved by the  community.  I covered this topic in my most recent <a title="http://www.networkworld.com/community/blog/13681" href="http://www.networkworld.com/community/blog/13681">post</a> on the Network  World open source subnet. I invite your feedback: what’s your experience with  source code or open source escrow? Any best practices or cautionary tales to  share? Looking forward to hearing from you.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2010/08/19/open-source-escrow-to-the-rescue/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Accessing words around a positional match in Lucene</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/</link>
		<comments>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/#comments</comments>
		<pubDate>Tue, 26 May 2009 18:50:15 +0000</pubDate>
		<dc:creator>Grant Ingersoll</dc:creator>
				<category><![CDATA[Lucene]]></category>
		<category><![CDATA[sentiment analysis]]></category>
		<category><![CDATA[Span Queries]]></category>
		<category><![CDATA[term vectors]]></category>

		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672</guid>
		<description><![CDATA[<p>From time to time, users on the Lucene mailing list ask a variant of the following question:</p>
<blockquote><p>Given a term match in a document, what&#8217;s the best way to get a window of words around that match?</p></blockquote>
<p>Getting a window of words around a match can be useful for a lot of things, including, to name a few:</p>
<ol>
<li>Highlighting (although I&#8217;d recommend using Lucene&#8217;s Highlighter package for that)</li>
<li>Co-occurrence analysis</li>
<li>Sentiment analysis</li>
<li>Question Answering</li>
</ol>
<p>Unfortunately, &#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>From time to time, users on the Lucene mailing list ask a variant of the following question:</p>
<blockquote><p>Given a term match in a document, what&#8217;s the best way to get a window of words around that match?</p></blockquote>
<p>Getting a window of words around a match can be useful for a lot of things, including, to name a few:</p>
<ol>
<li>Highlighting (although I&#8217;d recommend using Lucene&#8217;s Highlighter package for that)</li>
<li>Co-occurrence analysis</li>
<li>Sentiment analysis</li>
<li>Question Answering</li>
</ol>
<p>Unfortunately, given how inverted indexes are structured, retrieving content around a match isn&#8217;t efficient without doing some extra work during indexing.  In Lucene, this &#8220;extra work&#8221; involves creating and storing <a href="http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/TermFreqVector.html">Term Vectors</a> with position and offset information.</p>
<p>Storing Term Vector info can be done by adding in the appropriate code during Field construction, as in the following indexing example where I create an index from a few dummy documents (complete code is at the bottom of this post):</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">    RAMDirectory ramDir <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> RAMDirectory<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">//Index some made up content</span>
    IndexWriter writer <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IndexWriter<span style="color: #009900;">&#40;</span>ramDir, <span style="color: #000000; font-weight: bold;">new</span> StandardAnalyzer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000066; font-weight: bold;">true</span>, IndexWriter.<span style="color: #006633;">MaxFieldLength</span>.<span style="color: #006633;">UNLIMITED</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> DOCS.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
      <span style="color: #003399;">Document</span> doc <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Document</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">Field</span> id <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Field</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;id&quot;</span>, <span style="color: #0000ff;">&quot;doc_&quot;</span> <span style="color: #339933;">+</span> i, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Store</span>.<span style="color: #006633;">YES</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Index</span>.<span style="color: #006633;">NOT_ANALYZED_NO_NORMS</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      doc.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>id<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">//Store both position and offset information</span>
      <span style="color: #003399;">Field</span> text <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Field</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;content&quot;</span>, DOCS<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Store</span>.<span style="color: #006633;">NO</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Index</span>.<span style="color: #006633;">ANALYZED</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">TermVector</span>.<span style="color: #006633;">WITH_POSITIONS_OFFSETS</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      doc.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>text<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      writer.<span style="color: #006633;">addDocument</span><span style="color: #009900;">&#40;</span>doc<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    writer.<span style="color: #006633;">close</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Notice the use of the Field.TermVector.WITH_POSITIONS_OFFSETS when constructing the text Field.  This tells Lucene to store term vector information on a per document basis (in other words, not inverted) with both Position and Offset information.  (Due note, other storage options are available, see <a href="http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/document/Field.TermVector.html">Field.TermVector</a>.  Also note, storing Term Vectors will cost you in disk space.)</p>
<p>For completeness, the DOCS array looks like:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span> <span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> DOCS <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #0000ff;">&quot;The quick red fox jumped over the lazy brown dogs.&quot;</span>,
        <span style="color: #0000ff;">&quot;Mary had a little lamb whose fleece was white as snow.&quot;</span>,
        <span style="color: #0000ff;">&quot;Moby Dick is a story of a whale and a man obsessed.&quot;</span>,
        <span style="color: #0000ff;">&quot;The robber wore a black fleece jacket and a baseball cap.&quot;</span>,
        <span style="color: #0000ff;">&quot;The English Springer Spaniel is the best of all dogs.&quot;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Now that we have an index created, we need to do a search.  In our case, we need to do a position-based search as opposed to the more traditional document-based search.  In other words, it is not good enough to simply  know whether a term is in a document or not (think TermQuery), we need to know where in the document the match occurred.  Lucene enables position-based search through a series of Query classes collectively known as Span Queries.  (See <a href="http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/spans/SpanQuery.html">SpanQuery</a> and its derivitaves in the <a href="http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/spans/package-summary.html">org.apache.lucene.search.spans</a> package.)</p>
<p>Again, an example is warranted.  Assume we wanted to find where the term &#8220;fleece&#8221; occurs.  In this case, let&#8217;s start by doing a &#8220;normal&#8221; search, wherein we submit a query to the index and print out the Dcoument id and Score:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">    IndexSearcher searcher <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IndexSearcher<span style="color: #009900;">&#40;</span>ramDir<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// Do a search using SpanQuery</span>
    SpanTermQuery fleeceQ <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> SpanTermQuery<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Term<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;content&quot;</span>, <span style="color: #0000ff;">&quot;fleece&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    TopDocs results <span style="color: #339933;">=</span> searcher.<span style="color: #006633;">search</span><span style="color: #009900;">&#40;</span>fleeceQ, <span style="color: #cc66cc;">10</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> results.<span style="color: #006633;">scoreDocs</span>.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      ScoreDoc scoreDoc <span style="color: #339933;">=</span> results.<span style="color: #006633;">scoreDocs</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Score Doc: &quot;</span> <span style="color: #339933;">+</span> scoreDoc<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span></pre></div></div>

<p>That code looks pretty much like any basic search code with the exception that I substituted in a <a href="http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/spans/SpanTermQuery.html">SpanTermQuery</a> for what is often a TermQuery.  In fact, so far this isn&#8217;t all that interesting and it is likely to be slower than the comparable TermQuery too.</p>
<p>What does make it interesting?  If you look at the SpanQuery API, you will notice a method called getSpans().  The getSpans() method provides positional information about where a match occurred.  Thus, to print out the positional information, one might do:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">    Spans spans <span style="color: #339933;">=</span> fleeceQ.<span style="color: #006633;">getSpans</span><span style="color: #009900;">&#40;</span>searcher.<span style="color: #006633;">getIndexReader</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #009900;">&#40;</span>spans.<span style="color: #006633;">next</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
      <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Doc: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">doc</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; Start: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; End: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">end</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span></pre></div></div>

<p>First off, notice getting the Spans is completely independent of running the actual query.  In fact, you need not run the query first.  Second, the start and end values are the positions of the tokens, not the offsets.</p>
<p>Now, given the position information, the question becomes how to get only those tokens around the match.  To answer that, we need a few things:</p>
<ol>
<li>The specification of a window in terms of positions.  For instance, I want the terms within two positions of the start and end of the span.</li>
<li>A <a href="http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/TermVectorMapper.html">TermVectorMapper</a> implementation that is aware of both the window and the position.  Think of a TermVectorMapper as the equivalent of a SAX parser for Lucene&#8217;s Term Vectors.  Basically, instead of assuming the data structure (like DOM) it provides call backs and let&#8217;s you, the programmer, decide on the data structures.  See the <a href="http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/PositionBasedTermVectorMapper.html">PositionBasedTermVectorMapper</a> for a useful implementation.</li>
</ol>
<p>As a quick hack (and it is by no means production quality), I created the following code that modifies the printing code above:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">    WindowTermVectorMapper tvm <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> WindowTermVectorMapper<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">int</span> window <span style="color: #339933;">=</span> <span style="color: #cc66cc;">2</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//get the words within two of the match, inclusive of the boundaries</span>
    <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #009900;">&#40;</span>spans.<span style="color: #006633;">next</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Doc: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">doc</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; Start: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; End: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">end</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">//build up the window</span>
      tvm.<span style="color: #006633;">start</span> <span style="color: #339933;">=</span> spans.<span style="color: #006633;">start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> window<span style="color: #339933;">;</span>
      tvm.<span style="color: #006633;">end</span> <span style="color: #339933;">=</span> spans.<span style="color: #006633;">end</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> window<span style="color: #339933;">;</span>
      reader.<span style="color: #006633;">getTermFreqVector</span><span style="color: #009900;">&#40;</span>spans.<span style="color: #006633;">doc</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;content&quot;</span>, tvm<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span>WindowEntry entry <span style="color: #339933;">:</span> tvm.<span style="color: #006633;">entries</span>.<span style="color: #006633;">values</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Entry: &quot;</span> <span style="color: #339933;">+</span> entry<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
      <span style="color: #666666; font-style: italic;">//clear out the entries for the next round</span>
      tvm.<span style="color: #006633;">entries</span>.<span style="color: #006633;">clear</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span></pre></div></div>

<p>Now, in this chunk of code, I first create a WindowTermVectorMapper (WTVM, beautiful name, right?) and then in the Spans loop, I tell the WTVM what my window looks like.  Next up, I ask Lucene&#8217;s IndexReader for the TermVector and pass in my TermVectorMapper.  Finally, I print out the entries.</p>
<p>Of course, the last bit of useful info is what does the WTVM look like.  Here&#8217;s the most useful snippet of code:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> map<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> term, <span style="color: #000066; font-weight: bold;">int</span> frequency, TermVectorOffsetInfo<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> offsets, <span style="color: #000066; font-weight: bold;">int</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> positions<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> positions.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><span style="color: #666666; font-style: italic;">//unfortunately, we still have to loop over the positions</span>
      <span style="color: #666666; font-style: italic;">//we'll make this inclusive of the boundaries</span>
      <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>positions<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&gt;=</span> start <span style="color: #339933;">&amp;&amp;</span> positions<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;</span> end<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
        WindowEntry entry <span style="color: #339933;">=</span> entries.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span>term<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>entry <span style="color: #339933;">==</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
          entry <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> WindowEntry<span style="color: #009900;">&#40;</span>term<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
          entries.<span style="color: #006633;">put</span><span style="color: #009900;">&#40;</span>term, entry<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        entry.<span style="color: #006633;">positions</span>.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>positions<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span></pre></div></div>

<p>As you can see, I just look at the positions and check to see if the current term has an entry that is inside the start and end.  Obviously, you can do more interesting things here, but I&#8217;ll leave that up to you.  Also know that there are a few TermVectorMapper implementations in the Lucene distribution that you can use as examples.</p>
<p>That about wraps it up.  From here, one can easily imagine different ways to utilize the information returned from the Term Vector Mapper to process information about the terms in a window.  </p>
<p>The full code is below.  It is intended for demonstration purposes only.  Please note the disclaimers, etc.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">com.lucidimagination.noodles</span><span style="color: #339933;">;</span>
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the &quot;License&quot;); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an &quot;AS IS&quot; BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.store.RAMDirectory</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.index.IndexWriter</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.index.Term</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.index.IndexReader</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.index.TermVectorMapper</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.index.TermVectorOffsetInfo</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.analysis.standard.StandardAnalyzer</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.document.Document</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.document.Field</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.IndexSearcher</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.TopDocs</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.ScoreDoc</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.spans.SpanTermQuery</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.lucene.search.spans.Spans</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.LinkedHashMap</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.List</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.ArrayList</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 *  This class is for demonstration purposes only.  No warranty, guarantee, etc. is implied.
 *
 * This is not production quality code!
 *
 *
 **/</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> TermVectorFun <span style="color: #009900;">&#123;</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> DOCS <span style="color: #339933;">=</span> <span style="color: #009900;">&#123;</span>
          <span style="color: #0000ff;">&quot;The quick red fox jumped over the lazy brown dogs.&quot;</span>,
          <span style="color: #0000ff;">&quot;Mary had a little lamb whose fleece was white as snow.&quot;</span>,
          <span style="color: #0000ff;">&quot;Moby Dick is a story of a whale and a man obsessed.&quot;</span>,
          <span style="color: #0000ff;">&quot;The robber wore a black fleece jacket and a baseball cap.&quot;</span>,
          <span style="color: #0000ff;">&quot;The English Springer Spaniel is the best of all dogs.&quot;</span>
  <span style="color: #009900;">&#125;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> main<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> args<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span> <span style="color: #009900;">&#123;</span>
    RAMDirectory ramDir <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> RAMDirectory<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">//Index some made up content</span>
    IndexWriter writer <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IndexWriter<span style="color: #009900;">&#40;</span>ramDir, <span style="color: #000000; font-weight: bold;">new</span> StandardAnalyzer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #000066; font-weight: bold;">true</span>, IndexWriter.<span style="color: #006633;">MaxFieldLength</span>.<span style="color: #006633;">UNLIMITED</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> DOCS.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #003399;">Document</span> doc <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Document</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">Field</span> id <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Field</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;id&quot;</span>, <span style="color: #0000ff;">&quot;doc_&quot;</span> <span style="color: #339933;">+</span> i, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Store</span>.<span style="color: #006633;">YES</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Index</span>.<span style="color: #006633;">NOT_ANALYZED_NO_NORMS</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      doc.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>id<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">//Store both position and offset information</span>
      <span style="color: #003399;">Field</span> text <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Field</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;content&quot;</span>, DOCS<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Store</span>.<span style="color: #006633;">NO</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">Index</span>.<span style="color: #006633;">ANALYZED</span>, <span style="color: #003399;">Field</span>.<span style="color: #006633;">TermVector</span>.<span style="color: #006633;">WITH_POSITIONS_OFFSETS</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      doc.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>text<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      writer.<span style="color: #006633;">addDocument</span><span style="color: #009900;">&#40;</span>doc<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    writer.<span style="color: #006633;">close</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">//Get a searcher</span>
    IndexSearcher searcher <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> IndexSearcher<span style="color: #009900;">&#40;</span>ramDir<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #666666; font-style: italic;">// Do a search using SpanQuery</span>
    SpanTermQuery fleeceQ <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> SpanTermQuery<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Term<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;content&quot;</span>, <span style="color: #0000ff;">&quot;fleece&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    TopDocs results <span style="color: #339933;">=</span> searcher.<span style="color: #006633;">search</span><span style="color: #009900;">&#40;</span>fleeceQ, <span style="color: #cc66cc;">10</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> results.<span style="color: #006633;">scoreDocs</span>.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      ScoreDoc scoreDoc <span style="color: #339933;">=</span> results.<span style="color: #006633;">scoreDocs</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
      <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Score Doc: &quot;</span> <span style="color: #339933;">+</span> scoreDoc<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    IndexReader reader <span style="color: #339933;">=</span> searcher.<span style="color: #006633;">getIndexReader</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    Spans spans <span style="color: #339933;">=</span> fleeceQ.<span style="color: #006633;">getSpans</span><span style="color: #009900;">&#40;</span>reader<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    WindowTermVectorMapper tvm <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> WindowTermVectorMapper<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">int</span> window <span style="color: #339933;">=</span> <span style="color: #cc66cc;">2</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//get the words within two of the match</span>
    <span style="color: #000000; font-weight: bold;">while</span> <span style="color: #009900;">&#40;</span>spans.<span style="color: #006633;">next</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> <span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Doc: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">doc</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; Start: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; End: &quot;</span> <span style="color: #339933;">+</span> spans.<span style="color: #006633;">end</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #666666; font-style: italic;">//build up the window</span>
      tvm.<span style="color: #006633;">start</span> <span style="color: #339933;">=</span> spans.<span style="color: #006633;">start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">-</span> window<span style="color: #339933;">;</span>
      tvm.<span style="color: #006633;">end</span> <span style="color: #339933;">=</span> spans.<span style="color: #006633;">end</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> window<span style="color: #339933;">;</span>
      reader.<span style="color: #006633;">getTermFreqVector</span><span style="color: #009900;">&#40;</span>spans.<span style="color: #006633;">doc</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;content&quot;</span>, tvm<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span>WindowEntry entry <span style="color: #339933;">:</span> tvm.<span style="color: #006633;">entries</span>.<span style="color: #006633;">values</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Entry: &quot;</span> <span style="color: #339933;">+</span> entry<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
      <span style="color: #666666; font-style: italic;">//clear out the entries for the next round</span>
      tvm.<span style="color: #006633;">entries</span>.<span style="color: #006633;">clear</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">//Not thread-safe</span>
<span style="color: #000000; font-weight: bold;">class</span> WindowTermVectorMapper <span style="color: #000000; font-weight: bold;">extends</span> TermVectorMapper <span style="color: #009900;">&#123;</span>
&nbsp;
  <span style="color: #000066; font-weight: bold;">int</span> start<span style="color: #339933;">;</span>
  <span style="color: #000066; font-weight: bold;">int</span> end<span style="color: #339933;">;</span>
  LinkedHashMap entries <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> LinkedHashMap<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> map<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> term, <span style="color: #000066; font-weight: bold;">int</span> frequency, TermVectorOffsetInfo<span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> offsets, <span style="color: #000066; font-weight: bold;">int</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> positions<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">int</span> i <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span> i <span style="color: #339933;">&lt;</span> positions.<span style="color: #006633;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span><span style="color: #666666; font-style: italic;">//unfortunately, we still have to loop over the positions</span>
      <span style="color: #666666; font-style: italic;">//we'll make this inclusive of the boundaries</span>
      <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>positions<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&gt;=</span> start <span style="color: #339933;">&amp;&amp;</span> positions<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;</span> end<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
        WindowEntry entry <span style="color: #339933;">=</span> entries.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span>term<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>entry <span style="color: #339933;">==</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
          entry <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> WindowEntry<span style="color: #009900;">&#40;</span>term<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
          entries.<span style="color: #006633;">put</span><span style="color: #009900;">&#40;</span>term, entry<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        entry.<span style="color: #006633;">positions</span>.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>positions<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> setExpectations<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> field, <span style="color: #000066; font-weight: bold;">int</span> numTerms, <span style="color: #000066; font-weight: bold;">boolean</span> storeOffsets, <span style="color: #000066; font-weight: bold;">boolean</span> storePositions<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #666666; font-style: italic;">// do nothing for this example</span>
    <span style="color: #666666; font-style: italic;">//See also the PositionBasedTermVectorMapper.</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">class</span> WindowEntry<span style="color: #009900;">&#123;</span>
  <span style="color: #003399;">String</span> term<span style="color: #339933;">;</span>
  <span style="color: #003399;">List</span> positions <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">ArrayList</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span><span style="color: #666666; font-style: italic;">//a term could appear more than once w/in a position</span>
&nbsp;
  WindowEntry<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> term<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">this</span>.<span style="color: #006633;">term</span> <span style="color: #339933;">=</span> term<span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  @Override
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">String</span> toString<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #0000ff;">&quot;WindowEntry{&quot;</span> <span style="color: #339933;">+</span>
            <span style="color: #0000ff;">&quot;term='&quot;</span> <span style="color: #339933;">+</span> term <span style="color: #339933;">+</span> <span style="color: #0000ff;">'<span style="color: #000099; font-weight: bold;">\'</span>'</span> <span style="color: #339933;">+</span>
            <span style="color: #0000ff;">&quot;, positions=&quot;</span> <span style="color: #339933;">+</span> positions <span style="color: #339933;">+</span>
            <span style="color: #0000ff;">'}'</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
	</channel>
</rss>

