I've had an app doing prefix searches for a while. Recently the index size was increased and it turned out that some prefixes were too darned numerous for lucene to handle. It kept throwing me a Too Many Clauses error, which was very frustrating as I kept looking at my JARs and confirming that none of the included code actually used a boolean query.
Why doesn't it throw something like a Too Many Hits exception? And why does increasing the boolean query's static max clauses integer actually make this error go away, when I'm definitely only using a prefix query? Is there something fundamental to how queries are run that I'm not understanding; is it that they secretly become Boolean queries?
I've hit this before. It has to do with the fact that lucene, under the covers, turns many (all?) things into boolean queries when you call Query.rewrite()
From: http://web.archive.org/web/20110915061619/http://lucene.apache.org:80/java/2_2_0/api/org/apache/lucene/search/Query.html
public Query rewrite(IndexReader reader)
throws IOException
Expert: called to re-write queries into primitive queries.
For example, a PrefixQuery will be rewritten into a
BooleanQuery that consists of TermQuerys.
Throws:
IOException
The API reference page of TooManyClauses shows that PrefixQuery, FuzzyQuery, WildcardQuery, and RangeQuery are expanded this way (into BooleanQuery). Since it is in the API reference, it should be a behavior that users can rely on. Lucene does not place arbitrary limits on the number of hits (other than a document ID being an int) so a "too many hits" exception might not make sense. Perhaps PrefixQuery.rewrite(IndexReader) should catch the TooManyClauses and throw a "too many prefixes" exception, but right now it does not behave that way.
By the way, another way to search by prefix is to use PrefixFilter. Either filter your query with it or wrap the filter with a ConstantScoreQuery.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With