Using Elasticsearch histogram functionality, i can put various 'ranges' of data into a bucket by specifying an interval. In this case '50':
Price: 0-50 50-100 100-150 150-200 200-250 etc
This works fine, but this returns an awfully long list of buckets. What I'd prefer is:
0-50 50-100 100-200 200-400 400-1000 1000+
Or something of the likes. Is it possible to tell ES what intervals (/ranges) it should return?
You need to use the numeric range
aggregation, which allows you to specify exactly which intervals you want, such as this:
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"ranges" : [
{ "to" : 50 },
{ "from" : 50, "to" : 100 },
{ "from" : 100, "to": 200 },
{ "from" : 200, "to": 400 },
{ "from" : 400, "to": 1000 },
{ "from" : 1000 }
]
}
}
}
}
This should give you exactly what you expect.
Documentation explicitly says about histogram aggregation that
It dynamically builds fixed size (a.k.a. interval) buckets over the values.
What I can think of is that in order to reduce number of buckets you may apply logarithmic scale (or any other non-linear scale, e.g. square root, that will give enough granularity for your particular dataset) to values using script option:
{
"aggs": {
"prices": {
"histogram": {
"field": "price",
"script": "Math.log10(_value)",
"interval": 1
}
}
}
}
This will give buckets with keys 1, 2, 3,… that stand for original values within intervals [0;10), [10;100), [100;1000),…
By applying reverse function (10x in this case) to the keys on client side you can restore original scale.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With