I'd like to search my index on two fields called "a" and "b". I am given searches like Freud -- theories of psychology
and I'd like to perform the following query:
(a="Freud" AND b="theories of psychology") OR (b="Freud" AND a="theories of psychology")
How do I do this? So far I have Lucene constructing the two halves (firstHalf
and secondHalf
) using MultiFieldQueryParser
, then I've combined them with
BooleanQuery combined = new BooleanQuery();
combined.add(firstHalf, BooleanClause.Occur.SHOULD);
combined.add(secondHalf, BooleanClause.Occur.SHOULD);
But combined
allows results to be returned where only "theories" is found and not "psychology", where I definitely want both terms. It seems like Lucene is splitting "theories of psychology" into three words and combining them individually with OR. How do I prevent this?
firstHalf
looks like:
Query firstHalf = MultiFieldQueryParser.parse(Version.LUCENE_33,
new String[]{"Freud", "theories of psychology"},
new String[]{"a", "b"},
new BooleanClause.Occur[]{BooleanClause.Occur.MUST, BooleanClause.Occur.MUST},
analyzer);
where analyzer
is just a StandardAnalyzer
object.
Figured it out myself, but now the code is significantly longer; if anyone knows a more elegant solution, please post and I'll gladly reward. :) (Although I'll be making this into a method shortly...but here's the full version of what's going on...)
QueryParser parser = new QueryParser(Version.LUCENE_33, "a", analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query a_0 = parser.parse("Freud");
parser = new QueryParser(Version.LUCENE_33, "b", analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query b_1 = parser.parse("theories of psychology");
BooleanQuery firstHalf = new BooleanQuery();
firstHalf.add(a_0, BooleanClause.Occur.MUST);
firstHalf.add(b_1, BooleanClause.Occur.MUST);
parser = new QueryParser(Version.LUCENE_33, "b", analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query b_0 = parser.parse("Freud");
parser = new QueryParser(Version.LUCENE_33, "a", analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
Query a_1 = parser.parse("theories of psychology");
BooleanQuery secondHalf = new BooleanQuery();
secondHalf.add(b_0, BooleanClause.Occur.MUST);
secondHalf.add(a_1, BooleanClause.Occur.MUST);
BooleanQuery combined = new BooleanQuery();
combined.add(firstHalf, BooleanClause.Occur.SHOULD);
combined.add(secondHalf, BooleanClause.Occur.SHOULD);
Turns out SHOULD
does work the way I need it to here. Hopefully someone finds this helpful and I'm not just talking to myself in public ;)
Standard analyzer will tokenize. So the query theories of psychology
is equivalent to theories OR of OR psychology
.
If you want to search for the phrase "theories of psychology" use a PhraseQuery, or else note that the default QueryParser will interpret quotes as meaning a phrase (i.e. change your code to be "\"theories of psychology\""
).
And yes, there is a sense in which Lucene doesn't use Boolean logic, but it's technical and not really relevant here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With