Say I have an Elasticsearch index with bunch of users' comments:
{ "name": "chris", "date": "2016-01-01", "msg": "hi, foo"}
{ "name": "chris", "date": "2016-01-05", "msg": "bye, bar"}
{ "name": "aaron", "date": "2016-01-10", "msg": "who's bar"}
{ "name": "aaron", "date": "2016-01-15", "msg": "not foo"}
First, I want to find the lastest comment for each user. I can do that with the top_hits
aggregation:
"aggs": {
"name": {
"terms": { "field": "name" },
"aggs": {
"latest_comment": {
"top_hits": {
"sort": [ {"date": { "order": "desc" } } ],
"size": 1
}
}
}
}
}
}
Which effectively gives me the following:
{ "name": "chris", "date": "2016-01-05", "msg": "bye, bar"}
{ "name": "aaron", "date": "2016-01-15", "msg": "not foo"}
But how can I filter those results now?? And to be super clear, I want to filter after the top_hits
aggregation has picked the latest hits, not before.
Thank you.
A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.
Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.
sum_other_doc_count is the number of documents that didn't make it into the the top size terms.
A search consists of one or more queries that are combined and sent to Elasticsearch. Documents that match a search's queries are returned in the hits, or search results, of the response.
I had the exact question. The result after a lot of search was this:
If you want to filter the top hits results based on a numeric metric, you can use pipeline aggregations like bucket selector. This way is somehow implementing a SQL HAVING in elasticsearch. a very helpful answer for this case can be find implementing HAVING in elasticsearch
But if your metric to filter is not numeric there is no way (at least until v 6.2.4) to do that in elasticsearch side.
In this case as @ismail said you need to do that in client-side by your software.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With