Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Too many boolean clauses exception in solr

Tags:

solr

solrj

I am facing these problem while using OR , logical operator in framing query. I dont want to increase the maxBooleanClause value. Is there any other option than this. My OR range can go upto like 2 millions.I would rather want that if range of maxBooleanClause is exceeded than solr splits up the query, & finally merge all the subqueries. Is something of these sort possible? Or if any of you can suggest some better technique to do this.

I want to plot a graph where user provide some range of dates for e.g. between 2013-03-01 to 2013-06-01 gives all the visitors visiting the app. Here i want to make a query which is OR of all unique id's.For e.g.

      uniqueId:(1001 OR 1003 OR 1009 OR ........ OR 102467)

Help is appreciated.

like image 931
Ankit Ostwal Avatar asked Jun 03 '13 16:06

Ankit Ostwal


2 Answers

Solr imposes a maxBooleanClause precisely because this is the kind of thing that is outside of its sweet spot. Ultimately, if you need millions of searches, then you will need to do your own distribution and aggregation outside of Solr.

I am going to go out on a limb and guess that these clauses are graph related, which is the most common place I see these kinds of queries. In that case, it may be possible for you to stay somewhat inside Solr's strengths here.

Sometimes it makes sense to invert the logic of your filter, and instead of passing in a large set of values to filter by, index those values onto the documents you are searching so you can pass a single value later.

For example, say you have an index of people. And say you want to search for people who are friends with some specific person. You could generate the list of IDs of all their friends in order to filter your search. But then you'll have a similar problem to what you're seeing here: lots and lots of OR clauses.

Alternatively, you can index each person's list of friends into Solr. Now you'll have a field with thousands of values in it, but your query filter will have only one value: the ID of the person whose network you are filtering the search by.

This plays more toward Solr's strengths as far as the mechanics of searching are concerned. However, there is a cost. You'll need to manage the denormalization yourself, and probably be making a lot of updates to your documents, or suffering some latency in updates to your graph.

If that proves too onerous, you may need to consider a different technology better optimized for graph traversal.

like image 134
Nick Zadrozny Avatar answered Oct 17 '22 17:10

Nick Zadrozny


You can also use a more suitable query parser like a TermQueryParser which is better at handling massive OR clauses.

Example:

{!terms f=uniqueId}1000,1001,10002,10003

The default separator is ',' so all the terms can be which are being searched can be provided as term1,term2,term3 and so on..

More details here: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

like image 39
Rohit Avatar answered Oct 17 '22 17:10

Rohit