My documents are structured in the following way: <pre class="prettyprint"><code>{ "chefInfo": { "id": int, "employed": String ... Some more recipe information ... } "recipe": { ... Some recipe information ... } } </code></pre> If a chef has multiple recipes, the nested <code>chefInfo</code> block will be identical in each document. My problem is that I want to do an aggregation of a field in the <code>chefInfo</code> part of the document. However, this doesn't take into account for the fact that the <code>chefInfo</code> block is a duplicate. So, if the chef with the id of 1 is on 5 recipes and I am aggregating on the <code>employed</code> field then this particular chef, will represent 5 of the counts in the aggregation, whereas, I want them to only count a single one. I thought about doing a <code>top_hits</code> aggregation on the chef_id and then I wanted to do a sub-aggregation over all of the buckets but I can't work out how to do the counts over the results of all the buckets. Is it possible what I want to do?

For elastic every document in itself is unique. In your case you want to define uniqueness based on a different field, here <code>chefInfo.id</code>. To find unique count based on this field you have to make use of cardinality aggregation. You can apply the aggregation as below: <pre class="prettyprint"><code>{ "aggs": { "employed": { "nested": { "path": "chefInfo" }, "aggs": { "employed": { "terms": { "field": "chefInfo.employed.keyword" }, "aggs": { "employed_unique": { "cardinality": { "field": "chefInfo.id" } } } } } } } } </code></pre> In the result <code>employed_unique</code> give you the expected count.

Aggregate over top hits ElasticSearch

Tags:

elasticsearch

My documents are structured in the following way:

{
   "chefInfo": {
      "id": int,
      "employed": String
      ... Some more recipe information ...
   }
   "recipe": {
      ... Some recipe information ...
   }
}

If a chef has multiple recipes, the nested chefInfo block will be identical in each document. My problem is that I want to do an aggregation of a field in the chefInfo part of the document. However, this doesn't take into account for the fact that the chefInfo block is a duplicate.

So, if the chef with the id of 1 is on 5 recipes and I am aggregating on the employed field then this particular chef, will represent 5 of the counts in the aggregation, whereas, I want them to only count a single one.

I thought about doing a top_hits aggregation on the chef_id and then I wanted to do a sub-aggregation over all of the buckets but I can't work out how to do the counts over the results of all the buckets.

Is it possible what I want to do?

672

asked May 20 '19 11:05

Haych

1 Answers

For elastic every document in itself is unique. In your case you want to define uniqueness based on a different field, here chefInfo.id. To find unique count based on this field you have to make use of cardinality aggregation.

You can apply the aggregation as below:

{
  "aggs": {
    "employed": {
      "nested": {
        "path": "chefInfo"
      },
      "aggs": {
        "employed": {
          "terms": {
            "field": "chefInfo.employed.keyword"
          },
          "aggs": {
            "employed_unique": {
              "cardinality": {
                "field": "chefInfo.id"
              }
            }
          }
        }
      }
    }
  }
}

In the result employed_unique give you the expected count.

197

answered Sep 18 '22 09:09

Nishant

Related questions
                            
                                Elastic search query using match_phrase_prefix and fuzziness at the same time?
                            
                                Filter or analyzer to equate English numbers and arabic numerals
                            
                                Elasticsearch - Rank userIds based on score
                            
                                failed to parse field [datefield] of type [date]
                            
                                Kibana fails to pick up date from elasticsearch when I include the hour and minute
                            
                                Settings to improve elasticsearch startup time for unit tests?
                            
                                elasticsearch "Trying to create too many buckets" with nested bucket aggregations
                            
                                How to get elasticsearch to perform similar to SQL 'LIKE'
                            
                                Control order of token filters in ElasticSearch
                            
                                how to implement ElasticSearch in Flask app?
                            
                                Paging in Elasticsearch when results have equal scores
                            
                                Best practices for field names in ElasticSearch
                            
                                Is it possible to search for specific scopes with elasticsearch?
                            
                                NodeBuilder not found in the ElasticSearch API application
                            
                                How can I reshape my data before I turn it into a histogram?
                            
                                Elasticsearch doesn't allow to allocate unassigned shard
                            
                                Is there anyway to create a friendly URL for AWS Elasticsearch domain url?
                            
                                Link kibana Dashboard to "Discover"
                            
                                Elasticsearch 5.2.2: terms aggregation case insensitive
                            
                                elasticsearch: convert StreamOutput to String

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With