Control number of buckets created in an aggregation

Tags:

In Elasticsearch there's a limit on how many buckets you can create in an aggregation. If it creates more buckets than the specified limit, you will get a warning message In ES 6.x and an error will be thrown in future versions.

Here's the warning message:

This aggregation creates too many buckets (10001) and will throw an error in future versions. You should update the [search.max_buckets] cluster setting or use the [composite] aggregation to paginate all buckets in multiple requests.

Since ES 7.x, that limit is set to 10000 which can be adjusted, though.

The problem is, I can't actually calculate (or estimate) how many buckets an aggregation is going to create.

Consider the following request:

GET /zone_stats_hourly/_search
{
   "aggs":{
      "apps":{
         "terms":{
            "field":"appId",
            "size":<NUM_TERM_BUCKETS>,
            "min_doc_count":1,
            "shard_min_doc_count":0,
            "show_term_doc_count_error":false,
            "order":[
               {
                  "_count":"desc"
               },
               {
                  "_key":"asc"
               }
            ]
         },
         "aggregations":{
            "histogram":{
               "days":{
                  "field":"processTime",
                  "time_zone":"UTC",
                  "interval":"1d",
                  "offset":0,
                  "order":{
                     "_key":"asc"
                  },
                  "keyed":false,
                  "min_doc_count":0
               },
               "aggregations":{
                  "requests":{
                     "sum":{
                        "field":"requests"
                     }
                  },
                  "filled":{
                     "sum":{
                        "field":"filledRequests"
                     }
                  },
                  "matched":{
                     "sum":{
                        "field":"matchedRequests"
                     }
                  },
                  "imp":{
                     "sum":{
                        "field":"impressions"
                     }
                  },
                  "cv":{
                     "sum":{
                        "field":"completeViews"
                     }
                  },
                  "clicks":{
                     "sum":{
                        "field":"clicks"
                     }
                  },
                  "installs":{
                     "sum":{
                        "field":"installs"
                     }
                  },
                  "actions":{
                     "sum":{
                        "field":"actions"
                     }
                  },
                  "earningsIRT":{
                     "sum":{
                        "field":"earnings.inIRT"
                     }
                  },
                  "earningsUSD":{
                     "sum":{
                        "field":"earnings.inUSD"
                     }
                  },
                  "earningsEUR":{
                     "sum":{
                        "field":"earnings.inEUR"
                     }
                  },
                  "dealBasedEarnings":{
                     "nested":{
                        "path":"dealBasedEarnings"
                     },
                     "aggregations":{
                        "types":{
                           "terms":{
                              "field":"dealBasedEarnings.type",
                              "size":4,
                              "min_doc_count":1,
                              "shard_min_doc_count":0,
                              "show_term_doc_count_error":false,
                              "order":[
                                 {
                                    "_count":"desc"
                                 },
                                 {
                                    "_key":"asc"
                                 }
                              ]
                           },
                           "aggregations":{
                              "dealBasedEarningsIRT":{
                                 "sum":{
                                    "field":"dealBasedEarnings.amount.inIRT"
                                 }
                              },
                              "dealBasedEarningsUSD":{
                                 "sum":{
                                    "field":"dealBasedEarnings.amount.inUSD"
                                 }
                              },
                              "dealBasedEarningsEUR":{
                                 "sum":{
                                    "field":"dealBasedEarnings.amount.inEUR"
                                 }
                              }
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   },
   "size":0,
   "_source":{
      "excludes":[]
   },
   "stored_fields":["*"],
   "docvalue_fields":[
      {
         "field":"eventTime",
         "format":"date_time"
      },
      {
         "field":"processTime",
         "format":"date_time"
      },
      {
         "field":"postBack.time",
         "format":"date_time"
      }
   ],
   "query":{
      "bool":{
         "must":[
            {
               "range":{
                  "processTime":{
                     "from":1565049600000,
                     "to":1565136000000,
                     "include_lower":true,
                     "include_upper":false,
                     "boost":1.0
                  }
               }
            }
         ],
         "adjust_pure_negative":true,
         "boost":1.0
      }
   }
}

If I set <NUM_TERM_BUCKETS> to 2200 and perform the request, I get the warning message that says I'm creating more than 10000 buckets (how?!).

A sample response from ES:

#! Deprecation: 299 Elasticsearch-6.7.1-2f32220 "This aggregation creates too many buckets (10001) and will throw an error in future versions. You should update the [search.max_buckets] cluster setting or use the [composite] aggregation to paginate all buckets in multiple requests."
{
  "took": 6533,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 103456,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "apps": {
      "doc_count_error_upper_bound": 9,
      "sum_other_doc_count": 37395,
      "buckets":[...]
    }
  }
}

More interestingly, after decreasing <NUM_TERM_BUCKETS> to 2100, I get no warning messages, which means the number of buckets created is below 10000.

I've had a hard time to find the reason behind that and found NOTHING.

Is there any formula or something to calculate or estimate the number of buckets that an aggregation is going to create before actually performing the request?

I want to know if an aggregation throws error in ES 7.x or later regarding to a specified search.max_buckets, so that I can decide whether to use the composite aggregation or not.

UPDATE

I tried a much simpler aggregation containing no nested or sub aggregations on an index having roughly 80000 documents.

Here is the request:

GET /my_index/_search
{
   "size":0,
   "query":{
      "match_all":{}
   },
   "aggregations":{
      "unique":{
         "terms":{
            "field":"_id",
            "size":<NUM_TERM_BUCKETS>
         }
      }
   }
}

If I set the <NUM_TERM_BUCKETS> to 7000, I get this error response in ES 7.3:

{
   "error":{
      "root_cause":[
         {
            "type":"too_many_buckets_exception",
            "reason":"Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
            "max_buckets":10000
         }
      ],
      "type":"search_phase_execution_exception",
      "reason":"all shards failed",
      "phase":"query",
      "grouped":true,
      "failed_shards":[
         {
            "shard":0,
            "index":"my_index",
            "node":"XYZ",
            "reason":{
               "type":"too_many_buckets_exception",
               "reason":"Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
               "max_buckets":10000
            }
         }
      ]
   },
   "status":503
}

And it runs successfully if I decrease the <NUM_TERM_BUCKETS> to 6000.

Seriously, I'm confused. how on earth this aggregation creates more than 10000 buckets? Can anyone answer this?

968

asked Aug 07 '19 11:08

Ahmad Mozafarnia

1 Answers

According to the documentation for Terms Aggregation:

The shard_size parameter can be used to minimize the extra work that comes with bigger requested size. When defined, it will determine how many terms the coordinating node will request from each shard. Once all the shards responded, the coordinating node will then reduce them to a final result which will be based on the size parameter - this way, one can increase the accuracy of the returned terms and avoid the overhead of streaming a big list of buckets back to the client.

The default shard_size is (size * 1.5 + 10).

To address issues of accuracy in a distributed system, Elasticsearch asks for a number higher than size from each shard.

So, the maximum value of NUM_TERM_BUCKETS for a simple terms aggregation can be calculated using the following formula:

maxNumTermBuckets = (search.maxBuckets - 10) / 1.5

which is 6660 for search.maxBuckets = 10000.

answered Sep 18 '22 15:09

Ahmad Mozafarnia

Related questions
                            
                                No handler for type [text] declared on field [title] (python elasticsearch
                            
                                Searchkick - trailing special characters
                            
                                How to create an ElasticSearch Type and make it searchable inside the Index
                            
                                Updating existing documents in ElasticSearch (ES) while using rollover API
                            
                                Performance of Terms Query with many elements
                            
                                Docker Elasticsearch Plugin with Request
                            
                                Need to return source fields only, without any metadata - how to use plugin?
                            
                                elasticsearch change field type mapping to nested
                            
                                How combine query, must and must_not in elasticsearch?
                            
                                How can I filter the counter less than a parameter in Kibana?
                            
                                Elastic Search on AWS Fargate
                            
                                elasticsearch-dsl using from and size
                            
                                Elasticsearch crashes after showing t: failed to read local state , exiting
                            
                                ElasticSearch: Aggregate Over a Collected Set of Results
                            
                                ElasticSearch-dsl Create Query
                            
                                How to configure logstash in docker compose?
                            
                                Dashboard Only Mode in Kibana 7.0.1
                            
                                How to save dataframe to Elasticsearch in PySpark?
                            
                                Elasticsearch "More Like This" API vs. more_like_this query
                            
                                How to create request body for Python Elasticsearch mSearch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Control number of buckets created in an aggregation

Tags:

elasticsearch

elasticsearch-aggregation

Ahmad Mozafarnia

People also ask

1 Answers

Ahmad Mozafarnia

Recent Activity

Donate For Us