{'country': 'France', 'collected': '2018-03-12', 'active': true}
{'country': 'France', 'collected': '2018-03-13', 'active': true}
{'country': 'France', 'collected': '2018-03-14', 'active': false}
{'country': 'Canada', 'collected': '2018-02-01', 'active': false}
{'country': 'Canada', 'collected': '2018-02-02', 'active': true}
Let's say I have this resultset, and I want to group them by country. After grouping them by country this will be the result:
{'country': 'France', 'collected': '2018-03-14', 'active': false}
{'country': 'Canada', 'collected': '2018-02-02', 'active': true}
But I want to exclude results where the last row active
is false
(the older rows of the same country can be true or false doesn't matter as long as the last row equals true), how can I do that in elasticsearch? Here is my query:
POST /test/_search?search_type=count
{
"aggs": {
"group": {
"terms": {
"field": "country"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}
A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
sum_other_doc_count is the number of documents that didn't make it into the the top size terms.
Definition of aggregation 1 : a group, body, or mass composed of many distinct parts or individuals A galaxy is an aggregation of stars and gas. 2a : the collecting of units or parts into a mass or whole.
Bucket aggregations don't calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines whether or not a document in the current context "falls" into it.
I think you can get away with sorting by two fields in your top_hits
: by active
and by collected
. Basically, you want true
s to be first and when equal, then sort by collected
. Something like the following will always show the active:true
documents sorted by collected
.
The only downside to this solution is that if you don't have any active documents, top_hits
will show one active:false
document.
{
"size": 0,
"aggs": {
"group": {
"terms": {
"field": "country"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"active": {
"order": "desc"
},
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With