Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Scroll on Elasticsearch aggregation?

I am using Elasticsearch 5.3. I am aggregating on some data but the results are far too much to return in a single query. I tried using size = Integer.MAX_VALUE; but even that has proved to be less. In ES search API, there is a method to scroll through the search results. Is there a similar feature to use for the org.elasticsearch.search.aggregations.AggregationBuilders.terms aggregator and how do I use it? Can the search scroll API be used for the aggregators?

like image 439
khateeb Avatar asked Apr 11 '17 10:04

khateeb


People also ask

How do I use Elasticsearch scrolling?

To perform a scroll search, you need to add the scroll parameter to a search query and specify how long Elasticsearch should keep the search context viable. This query will return a maximum of 5000 hits. If the scroll is idle for more than 40 seconds, it will be deleted.

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is Bucket aggregation in Elasticsearch?

Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines whether or not a document in the current context "falls" into it.


1 Answers

In ES 5.3, you can partition the terms buckets and retrieve one partition per request.

For instance, in the query below, you can request to partition your buckets into 10 partitions and only return the first partition. It will return ~10x less data than if you wanted to retrieve all buckets at once.

{
   "size": 0,
   "aggs": {
      "my_terms": {
         "terms": {
            "field": "my_field",
            "include": {
               "partition": 0,
               "num_partitions": 10
            },
            "size": 10000
         }
      }
   }
}

You can then make the second request by increasing the partition to 1 and so on

{
   "size": 0,
   "aggs": {
      "my_terms": {
         "terms": {
            "field": "my_field",
            "include": {
               "partition": 1,           <--- increase this up until partition 9
               "num_partitions": 10
            },
            "size": 10000
         }
      }
   }
}

To add this in your Java code, you can do it like this:

TermsAggregationBuilder agg = AggregationBuilders.terms("my_terms");
agg.includeExclude(new IncludeExclude(0, 10));
like image 129
Val Avatar answered Oct 26 '22 06:10

Val