Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate over top hits ElasticSearch

My documents are structured in the following way:

{
   "chefInfo": {
      "id": int,
      "employed": String
      ... Some more recipe information ...
   }
   "recipe": {
      ... Some recipe information ...
   }
}

If a chef has multiple recipes, the nested chefInfo block will be identical in each document. My problem is that I want to do an aggregation of a field in the chefInfo part of the document. However, this doesn't take into account for the fact that the chefInfo block is a duplicate.

So, if the chef with the id of 1 is on 5 recipes and I am aggregating on the employed field then this particular chef, will represent 5 of the counts in the aggregation, whereas, I want them to only count a single one.

I thought about doing a top_hits aggregation on the chef_id and then I wanted to do a sub-aggregation over all of the buckets but I can't work out how to do the counts over the results of all the buckets.

Is it possible what I want to do?

like image 672
Haych Avatar asked May 20 '19 11:05

Haych


People also ask

What is top hits aggregation Elasticsearch?

A top_hits metric aggregator keeps track of the most relevant document being aggregated. This aggregator is intended to be used as a sub aggregator, so that the top matching documents can be aggregated per bucket.

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What are hits in Elasticsearch?

A search consists of one or more queries that are combined and sent to Elasticsearch. Documents that match a search's queries are returned in the hits, or search results, of the response.

What is Sum_other_doc_count?

sum_other_doc_count is the number of documents that didn't make it into the the top size terms.


1 Answers

For elastic every document in itself is unique. In your case you want to define uniqueness based on a different field, here chefInfo.id. To find unique count based on this field you have to make use of cardinality aggregation.

You can apply the aggregation as below:

{
  "aggs": {
    "employed": {
      "nested": {
        "path": "chefInfo"
      },
      "aggs": {
        "employed": {
          "terms": {
            "field": "chefInfo.employed.keyword"
          },
          "aggs": {
            "employed_unique": {
              "cardinality": {
                "field": "chefInfo.id"
              }
            }
          }
        }
      }
    }
  }
}

In the result employed_unique give you the expected count.

like image 197
Nishant Avatar answered Sep 18 '22 09:09

Nishant