I have a query returning ~200K hits from 7 different indices distributed across our cluster. I process my results as:
while (true) {
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
for (SearchHit hit : scrollResp.getHits()){
//process hit}
//Break condition: No hits are returned
if (scrollResp.hits().hits().length == 0) {
break;
}
}
I'm noticing that the client.prepareSearchScroll line can hang for quite some time before returning the next set of search hits. This seems to get worse the longer I run the code for.
My setup for the search is:
SearchRequestBuilder searchBuilder = client.prepareSearch( index_names )
.setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000)) //TimeValue?
.setQuery( qb )
.setFrom(0) //?
.setSize(5000); //number of jsons to get in each search, what should it be? I have no idea.
SearchResponse scrollResp = searchBuilder.execute().actionGet();
Is it expected that scanning and scrolling just takes a long time when examining many results? I'm very new to Elastic Search so keep in mind that I may be missing something very obvious.
My query:
QueryBuilder qb = QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("tweet", interesting_words));
.setSize(5000)
means that each client.prepareSearchScroll
call is going to retrieve 5000 records per shard. You are requesting back source, and if your records are big, assembling 5000 records in memory might take awhile. I would suggest trying a smaller number. Try 100 and 10 to see if you are getting a better performance.
.setFrom(0)
is not necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With