I have this query which creates 3 nested buckets:
POST /videos/_search
{
"aggs":{
"filtered_videos":{
"filter":{
"terms":{
"videoId.keyword":[
"randomId1",
"randomId2",
"randomId3",
500 more...
]
}
},
"aggs":{
"filtered_usernames":{
"filter":{
"terms":{
"username.keyword":[
"userExample1",
"userExample2",
"userExample3",
500 more...
]
}
},
"aggs":{
"success_actions":{
"filter":{
"term":{
"success":true
}
},
"aggs":{
"usernames":{
"terms":{
"field":"username.keyword",
"size":10000
},
"aggs":{
"videos":{
"terms":{
"field":"videoId.keyword",
"size":10000,
"missing":"random"
},
"aggs":{
"actions":{
"terms":{
"field":"actionType.keyword",
"size":10000
}
}
}
}
}
}
}
}
}
}
}
}
}
}
This creates 3 nested buckets, usernames, videos of each username, and actions of each videos of each usernames, which is exactly what I want.
The problem is it appears the default limit of elasticsearch is 10000 buckets. However for my use case I need 500 username buckets, which each have 500 video buckets, which each have 20 action buckets. So 500 * 500 * 20 or 5 million buckets. I know I can raise the limit, that's not my question.
My questions are:
Does elasticsearch counts each child bucket as a bucket, which means I would have to raise the limit to 5 million, or is there another way it calculates?
Yes. Each "root" bucket contains 500 buckets and each of these 500 contains 20 and so on. So, yeah, 500*500*20. But your query has 6 levels of aggregations...
Will elasticsearch be able to handle such a query or will it crash if I raise the limit to 5 million?
That's a big number and if ES can handle it or not is quite impossible to say. There are many variables involved in this (number of nodes, how resourced are they, what's the load they are handling, memory usage, cpu usage etc etc) and only tests can answer the question. It could be possible the query to run successfully some times and fail other times (if the cluster is more loaded), for example.
How can I optimize my query to get the same data with less buckets?
First of all, why do you need that many in one go? That's a humanly impossible number of results to go through. Try to use composite
aggregation and "paginate" through the results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With