Accessing words around a positional match in Lucene

From time to time, users on the Lucene mailing list ask a variant of the following question:

Given a term match in a document, what’s the best way to get a window of words around that match?

Getting a window of words around a match can be useful for a lot of things, including, to name a few:

  1. Highlighting (although I’d recommend using Lucene’s Highlighter package for that)
  2. Co-occurrence analysis
  3. Sentiment analysis
  4. Question Answering

Unfortunately, given how inverted indexes are structured, retrieving…

Read more...