Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch: filter top hits aggregation

Say I have an Elasticsearch index with bunch of users' comments:

{ "name": "chris", "date": "2016-01-01", "msg": "hi, foo"}
{ "name": "chris", "date": "2016-01-05", "msg": "bye, bar"}
{ "name": "aaron", "date": "2016-01-10", "msg": "who's bar"}
{ "name": "aaron", "date": "2016-01-15", "msg": "not foo"}

First, I want to find the lastest comment for each user. I can do that with the top_hits aggregation:

"aggs": {
    "name": {
      "terms": { "field": "name" },
      "aggs": {
        "latest_comment": {
          "top_hits": {
            "sort": [ {"date": { "order": "desc" } } ],
            "size": 1
            }
          }
        }
      }
    }
  }

Which effectively gives me the following:

{ "name": "chris", "date": "2016-01-05", "msg": "bye, bar"}
{ "name": "aaron", "date": "2016-01-15", "msg": "not foo"}

But how can I filter those results now?? And to be super clear, I want to filter after the top_hits aggregation has picked the latest hits, not before.

Thank you.

like image 707
cjbottaro Avatar asked Apr 13 '16 01:04

cjbottaro


People also ask

What is top hits aggregation Elasticsearch?

A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is Sum_other_doc_count?

sum_other_doc_count is the number of documents that didn't make it into the the top size terms.

What are hits in Elasticsearch?

A search consists of one or more queries that are combined and sent to Elasticsearch. Documents that match a search's queries are returned in the hits, or search results, of the response.


1 Answers

I had the exact question. The result after a lot of search was this:

If you want to filter the top hits results based on a numeric metric, you can use pipeline aggregations like bucket selector. This way is somehow implementing a SQL HAVING in elasticsearch. a very helpful answer for this case can be find implementing HAVING in elasticsearch

But if your metric to filter is not numeric there is no way (at least until v 6.2.4) to do that in elasticsearch side.

In this case as @ismail said you need to do that in client-side by your software.

like image 124
hossein shemshadi Avatar answered Sep 18 '22 10:09

hossein shemshadi