Sentence aware search with Lucene SpanQueries

Question

Is it possible to use a Lucene SpanQuery to find all occurrences where the terms "red" "green" and "blue" all appear within a single sentence?

My first (incomplete/incorrect) approach is to write an analyzer that places a special sentence marker token and the beginning of a sentence in the same position as the first word of the sentence and to then query for something similar to the following:

SpanQuery termsInSentence = new SpanNearQuery(
  SpanQuery[] {
    new SpanTermQuery( new Term (MY_SPECIAL_SENTENCE_TOKEN)),
    new SpanTermQuery( new Term ("red")),
    new SpanTermQuery( new Term ("green")),
    new SpanTermQuery( new Term ("blue")),
  },
  999999999999,
  false
);

SpanQuery nextSentence = new SpanTermQuery( new Term (MY_SPECIAL_SENTENCE_TOKEN));

SpanNotQuery notInNextSentence = new SpanNotQuery(termsInSentence,nextSentence);

The problem, of course, is that nextSentence isn't really the next sentence, it's any sentence marker, including the one in the sentence that termsInSentence matches. Therefore this won't work.

My next approach is to create the analyzer that places the token before the sentence (that is before the first word rather than in the same position as the first word). The problem with this is that I then have to account for the extra offset caused by MY_SPECIAL_SENTENCE_TOKEN. What's more, this will particularly be bad at first when I'm using a naive pattern to split sentences (e.g. split on /\.\s+[A-Z0-9]/) because I'll have to account for all of the (false) sentence markers when I search for U. S. S. Enterprise.

So... how should I approach this?

Mark Leighton Fisher · Accepted Answer

I would index each sentence as a Lucene document, including a field that marks what source document the sentence came from. Depending on your source material, the overhead of sentence/LuceneDoc may acceptable.

Sentence aware search with Lucene SpanQueries

Tags:

search

lucene

sentence

JnBrymn

1 Answers

Mark Leighton Fisher

Recent Activity

Donate For Us

Sentence aware search with Lucene SpanQueries

Tags:

search

lucene

sentence

JnBrymn

1 Answers

Mark Leighton Fisher

Related questions

Recent Activity

Donate For Us