Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene: Searching multiple fields with default operator = AND

Tags:

java

lucene

To allow users to search across multiple fields with Lucene 3.5 I currently create and add a QueryParser for each field to be searched to a DisjunctionMaxQuery. This works great when using OR as the default operator but I now want to change the default operator to AND to get more accurate (and fewer) results.

Problem is, queryParser.setDefaultOperator(QueryParser.AND_OPERATOR) misses many documents since all terms must be in atleast 1 field.

For example, consider the following data for a document: title field = "Programming Languages", body field = "Java, C++, PHP". If a user were to search for Java Programming this particular document would not be included in the results since the title nor the body field contains all terms in the query although combined they do. I would want this document returned for the above query but not for the query HTML Programming.

I've considered a catchall field but I have a few problems with it. First, users frequently include per field terms in their queries (author:bill) which is not possible with a catchall field. Also, I highlight certain fields with FastVectorHighlighter which requires them to be indexed and stored. So by adding a catchall field I would have to index most of the same data twice which is time and space consuming.

Any ideas?

like image 674
Chris Davi Avatar asked Dec 17 '12 00:12

Chris Davi


1 Answers

Guess I should have done a little more research. Turns out MultiFieldQueryParser provides the exact functionality I was looking for. For whatever reason I was creating a QueryParser for each field I wanted to search like this:

String[] fields = {"title", "body", "subject", "author"};
QueryParser[] parsers = new QueryParser[fields.length];      
for(int i = 0; i < parsers.length; i++)
{
   parsers[i] = new QueryParser(Version.LUCENE_35, fields[i], analyzer);
   parsers[i].setDefaultOperator(QueryParser.AND_OPERATOR);
}

This would result in a query like this:

(+title:java +title:programming) | (+body:java +body:programming)

...which is not what I was looking. Now I create a single MultiFieldQueryParser like this:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_35, new String[]{"title", "body", "subject"}, analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);

This gives me the query I was looking for:

+(title:java body:java) +(title:programming body:programming)

Thanks to @seeta and @femtoRgon for the help!

like image 123
Chris Davi Avatar answered Oct 11 '22 22:10

Chris Davi