<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Accessing words around a positional match in Lucene</title>
	<atom:link href="http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/</link>
	<description>Exclusively dedicated to Apache Lucene/Solr open source search technology</description>
	<lastBuildDate>Sat, 04 Feb 2012 01:13:03 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
	<item>
		<title>By: Sujit Pal</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-7924</link>
		<dc:creator>Sujit Pal</dc:creator>
		<pubDate>Wed, 24 Aug 2011 06:16:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-7924</guid>
		<description>Thanks for the example Grant. I adapted your code to find concordances - the concordance was then culled manually to provide input for developing a component to automatically detect age group patterns in a document. 

Here is a link if you are interested:
http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html</description>
		<content:encoded><![CDATA[<p>Thanks for the example Grant. I adapted your code to find concordances &#8211; the concordance was then culled manually to provide input for developing a component to automatically detect age group patterns in a document. </p>
<p>Here is a link if you are interested:<br />
<a href="http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html" rel="nofollow">http://sujitpal.blogspot.com/2011/08/implementing-concordance-with-lucene.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kiran Umadi</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-7682</link>
		<dc:creator>Kiran Umadi</dc:creator>
		<pubDate>Fri, 10 Jun 2011 07:52:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-7682</guid>
		<description>Thanks alot, really helpful for people who are interested in lucene.</description>
		<content:encoded><![CDATA[<p>Thanks alot, really helpful for people who are interested in lucene.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Grant Ingersoll</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-5593</link>
		<dc:creator>Grant Ingersoll</dc:creator>
		<pubDate>Wed, 11 Aug 2010 13:42:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-5593</guid>
		<description>Sure, have a look at using the SpanNearQuery, from which you can get positions, etc.</description>
		<content:encoded><![CDATA[<p>Sure, have a look at using the SpanNearQuery, from which you can get positions, etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Diman</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-5590</link>
		<dc:creator>Diman</dc:creator>
		<pubDate>Wed, 11 Aug 2010 00:31:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-5590</guid>
		<description>Hello Grant!!

Thanks again for this great example. 
I am trying to build code with allows me to do same with exact &quot;phrase&quot; match. Is there any way to get word around a phrase? Or at least start end end position of a phrase?</description>
		<content:encoded><![CDATA[<p>Hello Grant!!</p>
<p>Thanks again for this great example.<br />
I am trying to build code with allows me to do same with exact &#8220;phrase&#8221; match. Is there any way to get word around a phrase? Or at least start end end position of a phrase?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Diman</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-5553</link>
		<dc:creator>Diman</dc:creator>
		<pubDate>Thu, 05 Aug 2010 13:13:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-5553</guid>
		<description>Wow!! Amazing, it works now!!

Thanks a lot Grant!!

P.S.: works with every lucene package from 2.4.1 to 3.0.1</description>
		<content:encoded><![CDATA[<p>Wow!! Amazing, it works now!!</p>
<p>Thanks a lot Grant!!</p>
<p>P.S.: works with every lucene package from 2.4.1 to 3.0.1</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Grant Ingersoll</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-5552</link>
		<dc:creator>Grant Ingersoll</dc:creator>
		<pubDate>Thu, 05 Aug 2010 12:44:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-5552</guid>
		<description>I think the brackets are being stripped.

&lt;pre&gt;
package com.lucidimagination.noodles;
/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the &quot;License&quot;); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an &quot;AS IS&quot; BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.TermVectorMapper;
import org.apache.lucene.index.TermVectorOffsetInfo;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.search.spans.Spans;
import org.apache.lucene.util.Version;

import java.io.IOException;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.ArrayList;


/**
 * This class is for demonstration purposes only.  No warranty, guarantee, etc. is implied.
 * &lt;p/&gt;
 * This is not production quality code!
 */
public class TermVectorFun {
  public static String[] DOCS = {
          &quot;The quick red fox jumped over the lazy brown dogs.&quot;,
          &quot;Mary had a little lamb whose fleece was white as snow.&quot;,
          &quot;Moby Dick is a story of a whale and a man obsessed.&quot;,
          &quot;The robber wore a black fleece jacket and a baseball cap.&quot;,
          &quot;The English Springer Spaniel is the best of all dogs.&quot;
  };

  public static void main(String[] args) throws IOException {
    RAMDirectory ramDir = new RAMDirectory();
    //Index some made up content
    IndexWriter writer = new IndexWriter(ramDir, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
    for (int i = 0; i &lt; DOCS.length; i++) {
      Document doc = new Document();
      Field id = new Field(&quot;id&quot;, &quot;doc_&quot; + i, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS);
      doc.add(id);
      //Store both position and offset information
      Field text = new Field(&quot;content&quot;, DOCS[i], Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
      doc.add(text);
      writer.addDocument(doc);
    }
    writer.close();
    //Get a searcher
    IndexSearcher searcher = new IndexSearcher(ramDir);
    // Do a search using SpanQuery
    SpanTermQuery fleeceQ = new SpanTermQuery(new Term(&quot;content&quot;, &quot;fleece&quot;));
    TopDocs results = searcher.search(fleeceQ, 10);
    for (int i = 0; i &lt; results.scoreDocs.length; i++) {
      ScoreDoc scoreDoc = results.scoreDocs[i];
      System.out.println(&quot;Score Doc: &quot; + scoreDoc);
    }
    IndexReader reader = searcher.getIndexReader();
    Spans spans = fleeceQ.getSpans(reader);
    WindowTermVectorMapper tvm = new WindowTermVectorMapper();
    int window = 2;//get the words within two of the match
    while (spans.next() == true) {
      System.out.println(&quot;Doc: &quot; + spans.doc() + &quot; Start: &quot; + spans.start() + &quot; End: &quot; + spans.end());
      //build up the window
      tvm.start = spans.start() - window;
      tvm.end = spans.end() + window;
      reader.getTermFreqVector(spans.doc(), &quot;content&quot;, tvm);
      for (WindowEntry entry : tvm.entries.values()) {
        System.out.println(&quot;Entry: &quot; + entry);
      }
      //clear out the entries for the next round
      tvm.entries.clear();
    }
  }

}

//Not thread-safe
class WindowTermVectorMapper extends TermVectorMapper {


  int start;
  int end;
  LinkedHashMap&lt;String, WindowEntry&gt; entries = new LinkedHashMap&lt;String, WindowEntry&gt;();

  public void map(String term, int frequency, TermVectorOffsetInfo[] offsets, int[] positions) {
    for (int i = 0; i &lt; positions.length; i++) {//unfortunately, we still have to loop over the positions
      //we&#039;ll make this inclusive of the boundaries
      if (positions[i] &gt;= start &amp;&amp; positions[i] &lt; end) {
        WindowEntry entry = entries.get(term);
        if (entry == null) {
          entry = new WindowEntry(term);
          entries.put(term, entry);
        }
        entry.positions.add(positions[i]);
      }
    }
  }

  public void setExpectations(String field, int numTerms, boolean storeOffsets, boolean storePositions) {
    // do nothing for this example
    //See also the PositionBasedTermVectorMapper.
  }

}

class WindowEntry {
  String term;
  List&lt;Integer&gt; positions = new ArrayList&lt;Integer&gt;();//a term could appear more than once w/in a position

  WindowEntry(String term) {
    this.term = term;
  }

  @Override
  public String toString() {
    return &quot;WindowEntry{&quot; +
            &quot;term=&#039;&quot; + term + &#039;\&#039;&#039; +
            &quot;, positions=&quot; + positions +
            &#039;}&#039;;
  }
}
&lt;/pre&gt;
</description>
		<content:encoded><![CDATA[<p>I think the brackets are being stripped.</p>
<pre>
package com.lucidimagination.noodles;
/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     <a href="http://www.apache.org/licenses/LICENSE-2.0" rel="nofollow">http://www.apache.org/licenses/LICENSE-2.0</a>
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.TermVectorMapper;
import org.apache.lucene.index.TermVectorOffsetInfo;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.search.spans.Spans;
import org.apache.lucene.util.Version;

import java.io.IOException;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.ArrayList;

/**
 * This class is for demonstration purposes only.  No warranty, guarantee, etc. is implied.
 *
<p />
 * This is not production quality code!
 */
public class TermVectorFun {
  public static String[] DOCS = {
          "The quick red fox jumped over the lazy brown dogs.",
          "Mary had a little lamb whose fleece was white as snow.",
          "Moby Dick is a story of a whale and a man obsessed.",
          "The robber wore a black fleece jacket and a baseball cap.",
          "The English Springer Spaniel is the best of all dogs."
  };

  public static void main(String[] args) throws IOException {
    RAMDirectory ramDir = new RAMDirectory();
    //Index some made up content
    IndexWriter writer = new IndexWriter(ramDir, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED);
    for (int i = 0; i &lt; DOCS.length; i++) {
      Document doc = new Document();
      Field id = new Field("id", "doc_" + i, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS);
      doc.add(id);
      //Store both position and offset information
      Field text = new Field("content", DOCS[i], Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
      doc.add(text);
      writer.addDocument(doc);
    }
    writer.close();
    //Get a searcher
    IndexSearcher searcher = new IndexSearcher(ramDir);
    // Do a search using SpanQuery
    SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "fleece"));
    TopDocs results = searcher.search(fleeceQ, 10);
    for (int i = 0; i &lt; results.scoreDocs.length; i++) {
      ScoreDoc scoreDoc = results.scoreDocs[i];
      System.out.println("Score Doc: " + scoreDoc);
    }
    IndexReader reader = searcher.getIndexReader();
    Spans spans = fleeceQ.getSpans(reader);
    WindowTermVectorMapper tvm = new WindowTermVectorMapper();
    int window = 2;//get the words within two of the match
    while (spans.next() == true) {
      System.out.println("Doc: " + spans.doc() + " Start: " + spans.start() + " End: " + spans.end());
      //build up the window
      tvm.start = spans.start() - window;
      tvm.end = spans.end() + window;
      reader.getTermFreqVector(spans.doc(), "content", tvm);
      for (WindowEntry entry : tvm.entries.values()) {
        System.out.println("Entry: " + entry);
      }
      //clear out the entries for the next round
      tvm.entries.clear();
    }
  }

}

//Not thread-safe
class WindowTermVectorMapper extends TermVectorMapper {

  int start;
  int end;
  LinkedHashMap&lt;String, WindowEntry&gt; entries = new LinkedHashMap&lt;String, WindowEntry&gt;();

  public void map(String term, int frequency, TermVectorOffsetInfo[] offsets, int[] positions) {
    for (int i = 0; i &lt; positions.length; i++) {//unfortunately, we still have to loop over the positions
      //we'll make this inclusive of the boundaries
      if (positions[i] &gt;= start &#038;&#038; positions[i] &lt; end) {
        WindowEntry entry = entries.get(term);
        if (entry == null) {
          entry = new WindowEntry(term);
          entries.put(term, entry);
        }
        entry.positions.add(positions[i]);
      }
    }
  }

  public void setExpectations(String field, int numTerms, boolean storeOffsets, boolean storePositions) {
    // do nothing for this example
    //See also the PositionBasedTermVectorMapper.
  }

}

class WindowEntry {
  String term;
  List&lt;Integer&gt; positions = new ArrayList&lt;Integer&gt;();//a term could appear more than once w/in a position

  WindowEntry(String term) {
    this.term = term;
  }

  @Override
  public String toString() {
    return "WindowEntry{" +
            "term='" + term + '\'' +
            ", positions=" + positions +
            '}';
  }
}
</pre>
]]></content:encoded>
	</item>
	<item>
		<title>By: Diman</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-5551</link>
		<dc:creator>Diman</dc:creator>
		<pubDate>Thu, 05 Aug 2010 11:58:27 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-5551</guid>
		<description>Error codelines(updated):
“class TermVectorFun”
       for (WindowEntry entry : tvm.entries.values())

//---------------------------------------------------//

&quot;class WindowTermVectorMapper&quot;
    WindowEntry entry = entries.get(term);</description>
		<content:encoded><![CDATA[<p>Error codelines(updated):<br />
“class TermVectorFun”<br />
       for (WindowEntry entry : tvm.entries.values())</p>
<p>//&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;//</p>
<p>&#8220;class WindowTermVectorMapper&#8221;<br />
    WindowEntry entry = entries.get(term);</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Diman</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-5550</link>
		<dc:creator>Diman</dc:creator>
		<pubDate>Thu, 05 Aug 2010 11:52:52 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-5550</guid>
		<description>Error codelines(updated):
&quot;class TermVectorFun&quot;

–-!&gt; for (WindowEntry entry : tvm.entries.values())  WindowEntry entry = entries.get(term); &lt;!–-</description>
		<content:encoded><![CDATA[<p>Error codelines(updated):<br />
&#8220;class TermVectorFun&#8221;</p>
<p>–-!&gt; for (WindowEntry entry : tvm.entries.values())  WindowEntry entry = entries.get(term); &lt;!–-</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Diman</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-5549</link>
		<dc:creator>Diman</dc:creator>
		<pubDate>Thu, 05 Aug 2010 10:04:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-5549</guid>
		<description>Error codelines:

class WindowTermVectorMapper
......

--&gt; for (WindowEntry entry : tvm.entries.values())  WindowEntry entry =  entries.get(term); &lt;--</description>
		<content:encoded><![CDATA[<p>Error codelines:</p>
<p>class WindowTermVectorMapper<br />
&#8230;&#8230;</p>
<p>&#8211;&gt; for (WindowEntry entry : tvm.entries.values())  WindowEntry entry =  entries.get(term); &lt;&#8211;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Diman</title>
		<link>http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/comment-page-1/#comment-5544</link>
		<dc:creator>Diman</dc:creator>
		<pubDate>Thu, 05 Aug 2010 00:37:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.lucidimagination.com/blog/?p=672#comment-5544</guid>
		<description>Thanks, but you postet idetical code like in example. My declarations are identical, also 
LinkedHashMap entries = new LinkedHashMap();

And according exception, &quot;entry&quot; and &quot;entries&quot; are incompatible types.

Unfortunatelly I still have errors in 

class WindowTermVectorMapper
.........

        if (positions[i] &gt;= start 
&amp;&amp; positions[i]  WindowEntry entry = entries.get(term);  for (WindowEntry entry : tvm.entries.values()) 
{
        System.out.println(&quot;Entry: &quot; + entry);
      }
      //clear out the entries for the next round
      tvm.entries.clear();
    }

Maybe it is because of toString() - Overriding in class WindowEntry ??</description>
		<content:encoded><![CDATA[<p>Thanks, but you postet idetical code like in example. My declarations are identical, also<br />
LinkedHashMap entries = new LinkedHashMap();</p>
<p>And according exception, &#8220;entry&#8221; and &#8220;entries&#8221; are incompatible types.</p>
<p>Unfortunatelly I still have errors in </p>
<p>class WindowTermVectorMapper<br />
&#8230;&#8230;&#8230;</p>
<p>        if (positions[i] &gt;= start<br />
&amp;&amp; positions[i]  WindowEntry entry = entries.get(term);  for (WindowEntry entry : tvm.entries.values())<br />
{<br />
        System.out.println(&#8220;Entry: &#8221; + entry);<br />
      }<br />
      //clear out the entries for the next round<br />
      tvm.entries.clear();<br />
    }</p>
<p>Maybe it is because of toString() &#8211; Overriding in class WindowEntry ??</p>
]]></content:encoded>
	</item>
</channel>
</rss>

