In Lucene to get the words around a term it is advised to use Span Queries. There is good walkthrough in http://lucidworks.com/blog/accessing-words-around-a-positional-match-in-lucene/
The spans are supposed to be accessed using the getSpans() method.
SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "fleece"));
Spans spans = fleeceQ.getSpans(searcher.getIndexReader());
Then in Lucene 4 the API changed and the getSpans() method got more complex, and finally, in the latest Lucene release (5.3.0), this method was removed (apparently moved to the SpanWeight class).
So, which is the current way of accessing spans matched by a span term query?
The way to do it would be as follows.
LeafReader pseudoAtomicReader = SlowCompositeReaderWrapper.wrap(reader);
Term term = new Term("field", "fox");
SpanTermQuery spanTermQuery = new SpanTermQuery(term);
SpanWeight spanWeight = spanTermQuery.createWeight(is, false);
Spans spans = spanWeight.getSpans(pseudoAtomicReader.getContext(), Postings.POSITIONS);
The support for iterating over the spans via span.next() is also gone in version 5.3 of Lucene. To iterate over the spans you can do
int nxtDoc = 0;
while((nxtDoc = spans.nextDoc()) != spans.NO_MORE_DOCS){
System.out.println(spans.toString());
int id = nxtDoc;
System.out.println("doc_id="+id);
Document doc = reader.document(id);
System.out.println(doc.getField("field"));
System.out.println(spans.nextStartPosition());
System.out.println(spans.endPosition());
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With