I'm doing map clustering using Elasticsearch GeoHash grid aggregation. The query returns on average 100-200 buckets. Each of the bucket uses the top_hits aggregation which I use to return 3 documents for each aggregated cluster.
The problem is that I want to return top_hits only when the parent aggregation (GeoHash) aggregates no more than 3 documents.
If a cluster aggregates more than 3 documents I don't want ES to return any documents for this cluster (because I'm not gonna use them).
I've tried to use Bucket Selector Aggregation, but didn't manage to construct a correct bucket_path.
I use bucket selector aggregation on the same level as top_hits aggregation.
The number of total documents for a bucket is available at top_hits.hits.total
but what I'm getting is reason=path not supported for [top_hits]: [hits.total]
.
Is this possible in elasticsearch? It's important for me, because in most of the queries only small percentage of buckets will have less than 3 documents. But top hits subaggregation is always returning top 3 documents even for clusters of 1000 documents. If a result of a query return 200 buckets and only 5 of them are aggregating <= 3 documents so I want to return only 5*3 documents, not 200*3 (Te response is 10MB in this case).
Here is the aggs part of my query:
"clusters": {
"geohash_grid": {
"field": "coordinates",
"precision": 3
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 3
}
},
"top_hits_filter": {
"bucket_selector": {
"buckets_path": {
"total_hits": "top_hits._count" // tried top_hits.hits.total
},
"script": {
"inline": "total_hits <= 3"
}
}
}
}
}
Try this @ilivewithian :
"aggs": {
"clusters": {
"geohash_grid": {
"field": "coordinates",
"precision": 3
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 3
}
},
"top_hits_filter": {
"bucket_selector": {
"buckets_path": {
"total_hits": "_count"
},
"script": {
"inline": "params.total_hits <= 3"
}
}
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With