Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

With Lucene: Why do I get a Too Many Clauses error if I do a prefix search?

I've had an app doing prefix searches for a while. Recently the index size was increased and it turned out that some prefixes were too darned numerous for lucene to handle. It kept throwing me a Too Many Clauses error, which was very frustrating as I kept looking at my JARs and confirming that none of the included code actually used a boolean query.

Why doesn't it throw something like a Too Many Hits exception? And why does increasing the boolean query's static max clauses integer actually make this error go away, when I'm definitely only using a prefix query? Is there something fundamental to how queries are run that I'm not understanding; is it that they secretly become Boolean queries?

like image 654
dlamblin Avatar asked Aug 12 '08 06:08

dlamblin


2 Answers

I've hit this before. It has to do with the fact that lucene, under the covers, turns many (all?) things into boolean queries when you call Query.rewrite()

From: http://web.archive.org/web/20110915061619/http://lucene.apache.org:80/java/2_2_0/api/org/apache/lucene/search/Query.html

public Query rewrite(IndexReader reader)
              throws IOException

    Expert: called to re-write queries into primitive queries.
            For example, a PrefixQuery will be rewritten into a
            BooleanQuery that consists of TermQuerys.

    Throws:
        IOException
like image 126
Ryan Ahearn Avatar answered Oct 20 '22 15:10

Ryan Ahearn


The API reference page of TooManyClauses shows that PrefixQuery, FuzzyQuery, WildcardQuery, and RangeQuery are expanded this way (into BooleanQuery). Since it is in the API reference, it should be a behavior that users can rely on. Lucene does not place arbitrary limits on the number of hits (other than a document ID being an int) so a "too many hits" exception might not make sense. Perhaps PrefixQuery.rewrite(IndexReader) should catch the TooManyClauses and throw a "too many prefixes" exception, but right now it does not behave that way.

By the way, another way to search by prefix is to use PrefixFilter. Either filter your query with it or wrap the filter with a ConstantScoreQuery.

like image 35
Kai Chan Avatar answered Oct 20 '22 15:10

Kai Chan