I'm using Java to query a Solr server for results that have IDs within a set of known IDs that I am interested in.
The best way I could think to get just these results that I am interested in was to create a long query string that looks something like this:
q=(item_id:XXX33-3333 OR item_id:YYY42-3445 OR item_id:JFDE-3838)
I generate this String, queryString
, before making my request, and there are over 1500 such ids included in the request I would eventually like to make. I am using an HTTP POST to make the query as such:
HttpPost post = new HttpPost(url);
post.setHeader("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
StringEntity entity = new StringEntity(queryString, "UTF-8");
entity.setContentType("application/x-www-form-urlencoded; charset=utf-8");
post.setEntity(entity);
HttpClient client = new DefaultHttpClient();
HttpResponse response = client.execute(post);
If I limit the query to just the first 1000 ids, it succeeds and I get the results back as I would expect. However, if I increase the query to include all 1500 that I am really interested in, I get an HTTP 400 response code with the following error:
HTTP/1.1 400 org.apache.lucene.queryParser.ParseException: Cannot parse '[my query here...]
Is there a limit to the number of ids that I can OR together in a Solr query? Is there another reason this might be failing when I go past 1000? I have experimented and it fails at around 1024 (my ids are all almost the same length) so it seems to suggest there is a character or term limit.
Or, if someone has a good suggestion of how I can retrieve the items I'm looking for in another, smarter, way, I would love to hear it. My backup solution is just to query Solr for all items, parse the results, and use the ones that belong to the set I am interested in. I would prefer not to do this, since the data source could have tens of thousands of items, and it would be inefficient.
Searching is the most powerful capability of Solr. Once we have the documents indexed in our repository, we can search for keywords, phrases, date ranges, etc. The results are sorted by relevance (score).
Standard solr queries use the "q" parameter in a request. Filter queries use the "fq" parameter. The primary difference is that filtered queries do not affect relevance scores; the query functions purely as a filter (docset intersection, essentially).
start Parameter The default value is 0 . In other words, by default, Solr returns results without an offset, beginning where the results themselves begin.
You can search for "solr" by loading the Admin UI Query tab, enter "solr" in the q param (replacing *:* , which matches all documents), and "Execute Query". See the Searching section below for more information. To index your own data, re-run the directory indexing command pointed to your own directory of documents.
There is no limit on the Solr side - we regularly use Solr in a similar way with tens of thousands of IDs in the query.
You need to look at the settings for your servlet container (Tomcat, Jetty etc.) and increase the maximum POST size. Look up maxPostSize
if you are using Tomcat and maxFormContentSize
if you are using Jetty.
As of Solr 6.0 there is a maxBooleanClauses
configuration within Solr - defaults to 1024.
I wrote a unit test to confirm and confirmed the limitation (with Solr 5.3).
See more here https://wiki.apache.org/solr/SolrConfigXml#The_Query_Section
FWIW there is an open Solr JIRA to remove it so it may be removed in the future https://issues.apache.org/jira/browse/SOLR-4586
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With