Is there a way to iterate over a Solrj response such that the results are fetched incrementally during iteration, rather than returning a giant in-memory ArrayList
?
Or do we have to resort to this:
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
int fetchSize = 1000;
query.setRows(fetchSize);
QueryResponse rsp = server.query(query);
long offset = 0;
long totalResults = rsp.getResults().getNumFound();
while (offset < totalResults)
{
query.setStart((int) offset); // requires an int? wtf?
query.setRows(fetchSize);
for (SolrDocument doc : server.query(query).getResults())
{
log.info((String) doc.getFieldValue("title"));
}
offset += fetchSize;
}
And while I'm on the topic, why does SolrQuery.setStart()
require an integer
, when SolrDocumentList.getStart()/getNumFound()
return long
?
Iterating over a list can also be achieved using a while loop. The block of code inside the loop executes until the condition is true. A loop variable can be used as an index to access each element.
SolrJ is an API that makes it easy for applications written in Java (or any language based on the JVM) to talk to Solr. SolrJ hides a lot of the details of connecting to Solr and allows your application to interact with Solr with simple high-level methods. SolrJ supports most Solr APIs, and is highly configurable.
SolrClient's are the main workhorses at the core of SolrJ. They handle the work of connecting to and communicating with Solr, and are where most of the user configuration happens.
That code looks correct. You could also wrap it in an Iterator so that your client code doesn't have to know anything about the underlying paging.
About SolrQuery.setStart()
requiring an Integer, it certainly looks odd, I think you're right and it should be a long as well. Try asking on the solr-user or lucene-dev mailing lists.
The reason, Caffeine, is that Solr is designed to give you the top X search results. The expectation is that you will have a "reasonable" number to return. If Solr has to look deep into the search results (into the thousands), you're rubbing against the grain for what Solr was designed for. It will work but the query response will get exponentially slower and slower the deeper into the search results you have to go. There is some ongoing work in Solr to make this use-case more efficient but I've seen no progress on it lately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With