Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch "Trying to create too many buckets" with nested bucket aggregations

I have this query which creates 3 nested buckets:

POST /videos/_search
{
  "aggs":{
    "filtered_videos":{
      "filter":{
        "terms":{
          "videoId.keyword":[
            "randomId1",
            "randomId2",
            "randomId3",
            500 more...
          ]
        }
      },
      "aggs":{
        "filtered_usernames":{
          "filter":{
            "terms":{
              "username.keyword":[
                "userExample1",
                "userExample2",
                "userExample3",
                500 more...
              ]
            }
          },
          "aggs":{
            "success_actions":{
              "filter":{
                "term":{
                  "success":true
                }
              },
              "aggs":{
                "usernames":{
                  "terms":{
                    "field":"username.keyword",
                    "size":10000
                  },
                  "aggs":{
                    "videos":{
                      "terms":{
                        "field":"videoId.keyword",
                        "size":10000,
                        "missing":"random"
                      },
                      "aggs":{
                        "actions":{
                          "terms":{
                            "field":"actionType.keyword",
                            "size":10000
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

This creates 3 nested buckets, usernames, videos of each username, and actions of each videos of each usernames, which is exactly what I want.

The problem is it appears the default limit of elasticsearch is 10000 buckets. However for my use case I need 500 username buckets, which each have 500 video buckets, which each have 20 action buckets. So 500 * 500 * 20 or 5 million buckets. I know I can raise the limit, that's not my question.

My questions are:

  • Does elasticsearch counts each child bucket as a bucket, which means I would have to raise the limit to 5 million, or is there another way it calculates?
  • Will elasticsearch be able to handle such a query or will it crash if I raise the limit to 5 million?
  • How can I optimize my query to get the same data with less buckets?
like image 825
user12341234 Avatar asked Jan 13 '20 10:01

user12341234


1 Answers

Does elasticsearch counts each child bucket as a bucket, which means I would have to raise the limit to 5 million, or is there another way it calculates?

Yes. Each "root" bucket contains 500 buckets and each of these 500 contains 20 and so on. So, yeah, 500*500*20. But your query has 6 levels of aggregations...

Will elasticsearch be able to handle such a query or will it crash if I raise the limit to 5 million?

That's a big number and if ES can handle it or not is quite impossible to say. There are many variables involved in this (number of nodes, how resourced are they, what's the load they are handling, memory usage, cpu usage etc etc) and only tests can answer the question. It could be possible the query to run successfully some times and fail other times (if the cluster is more loaded), for example.

How can I optimize my query to get the same data with less buckets?

First of all, why do you need that many in one go? That's a humanly impossible number of results to go through. Try to use composite aggregation and "paginate" through the results.

like image 158
Andrei Stefan Avatar answered Nov 14 '22 20:11

Andrei Stefan