Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the matching spans of a Span Term Query in Lucene 5?

Tags:

lucene

In Lucene to get the words around a term it is advised to use Span Queries. There is good walkthrough in http://lucidworks.com/blog/accessing-words-around-a-positional-match-in-lucene/

The spans are supposed to be accessed using the getSpans() method.

SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "fleece"));
Spans spans = fleeceQ.getSpans(searcher.getIndexReader());

Then in Lucene 4 the API changed and the getSpans() method got more complex, and finally, in the latest Lucene release (5.3.0), this method was removed (apparently moved to the SpanWeight class).

So, which is the current way of accessing spans matched by a span term query?

like image 270
Julián Solórzano Avatar asked Oct 30 '22 20:10

Julián Solórzano


1 Answers

The way to do it would be as follows.

LeafReader pseudoAtomicReader = SlowCompositeReaderWrapper.wrap(reader);
Term term = new Term("field", "fox");
SpanTermQuery spanTermQuery = new SpanTermQuery(term);
SpanWeight spanWeight = spanTermQuery.createWeight(is, false);
Spans spans = spanWeight.getSpans(pseudoAtomicReader.getContext(), Postings.POSITIONS);

The support for iterating over the spans via span.next() is also gone in version 5.3 of Lucene. To iterate over the spans you can do

int nxtDoc = 0;
while((nxtDoc = spans.nextDoc()) != spans.NO_MORE_DOCS){
  System.out.println(spans.toString());
  int id = nxtDoc;
  System.out.println("doc_id="+id);
  Document doc = reader.document(id);
  System.out.println(doc.getField("field"));
  System.out.println(spans.nextStartPosition());
  System.out.println(spans.endPosition());
}
like image 93
Apurv Avatar answered Nov 23 '22 00:11

Apurv