Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I reduce Elasticsearch scroll response time?

I have a query returning ~200K hits from 7 different indices distributed across our cluster. I process my results as:

while (true) {
    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();

    for (SearchHit hit : scrollResp.getHits()){
            //process hit}

    //Break condition: No hits are returned
    if (scrollResp.hits().hits().length == 0) {
        break;
    }
}

I'm noticing that the client.prepareSearchScroll line can hang for quite some time before returning the next set of search hits. This seems to get worse the longer I run the code for.

My setup for the search is:

SearchRequestBuilder searchBuilder = client.prepareSearch( index_names )
    .setSearchType(SearchType.SCAN)
    .setScroll(new TimeValue(60000)) //TimeValue?
    .setQuery( qb )
    .setFrom(0) //?
    .setSize(5000); //number of jsons to get in each search, what should it be? I have no idea.
    SearchResponse scrollResp = searchBuilder.execute().actionGet();

Is it expected that scanning and scrolling just takes a long time when examining many results? I'm very new to Elastic Search so keep in mind that I may be missing something very obvious.

My query:

QueryBuilder qb = QueryBuilders.boolQuery().must(QueryBuilders.termsQuery("tweet", interesting_words));
like image 811
dranxo Avatar asked Nov 20 '12 00:11

dranxo


1 Answers

.setSize(5000) means that each client.prepareSearchScroll call is going to retrieve 5000 records per shard. You are requesting back source, and if your records are big, assembling 5000 records in memory might take awhile. I would suggest trying a smaller number. Try 100 and 10 to see if you are getting a better performance.

.setFrom(0) is not necessary.

like image 179
imotov Avatar answered Sep 21 '22 01:09

imotov