Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I combine two Lucene queries using OR?

Tags:

java

lucene

I'd like to search my index on two fields called "a" and "b". I am given searches like Freud -- theories of psychology and I'd like to perform the following query:

(a="Freud" AND b="theories of psychology") OR (b="Freud" AND a="theories of psychology")

How do I do this? So far I have Lucene constructing the two halves (firstHalf and secondHalf) using MultiFieldQueryParser, then I've combined them with

BooleanQuery combined = new BooleanQuery();
combined.add(firstHalf, BooleanClause.Occur.SHOULD);
combined.add(secondHalf, BooleanClause.Occur.SHOULD);

But combined allows results to be returned where only "theories" is found and not "psychology", where I definitely want both terms. It seems like Lucene is splitting "theories of psychology" into three words and combining them individually with OR. How do I prevent this?

firstHalf looks like:

Query firstHalf = MultiFieldQueryParser.parse(Version.LUCENE_33,
         new String[]{"Freud", "theories of psychology"},
         new String[]{"a", "b"},
         new BooleanClause.Occur[]{BooleanClause.Occur.MUST, BooleanClause.Occur.MUST},
         analyzer);

where analyzer is just a StandardAnalyzer object.

like image 548
dmn Avatar asked Nov 23 '11 17:11

dmn


2 Answers

Figured it out myself, but now the code is significantly longer; if anyone knows a more elegant solution, please post and I'll gladly reward. :) (Although I'll be making this into a method shortly...but here's the full version of what's going on...)

QueryParser parser = new QueryParser(Version.LUCENE_33, "a", analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query a_0 = parser.parse("Freud");
parser = new QueryParser(Version.LUCENE_33, "b", analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query b_1 = parser.parse("theories of psychology");

BooleanQuery firstHalf = new BooleanQuery();
firstHalf.add(a_0, BooleanClause.Occur.MUST);
firstHalf.add(b_1, BooleanClause.Occur.MUST);

parser = new QueryParser(Version.LUCENE_33, "b", analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query b_0 = parser.parse("Freud");
parser = new QueryParser(Version.LUCENE_33, "a", analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query a_1 = parser.parse("theories of psychology");

BooleanQuery secondHalf = new BooleanQuery();
secondHalf.add(b_0, BooleanClause.Occur.MUST);
secondHalf.add(a_1, BooleanClause.Occur.MUST);

BooleanQuery combined = new BooleanQuery();
combined.add(firstHalf, BooleanClause.Occur.SHOULD);
combined.add(secondHalf, BooleanClause.Occur.SHOULD);

Turns out SHOULD does work the way I need it to here. Hopefully someone finds this helpful and I'm not just talking to myself in public ;)

like image 186
dmn Avatar answered Sep 24 '22 06:09

dmn


Standard analyzer will tokenize. So the query theories of psychology is equivalent to theories OR of OR psychology.

If you want to search for the phrase "theories of psychology" use a PhraseQuery, or else note that the default QueryParser will interpret quotes as meaning a phrase (i.e. change your code to be "\"theories of psychology\"").

And yes, there is a sense in which Lucene doesn't use Boolean logic, but it's technical and not really relevant here.

like image 34
Xodarap Avatar answered Sep 20 '22 06:09

Xodarap