Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get specific _source fields in aggregation

I am exploring ElasticSearch, to be used in an application, which will handle large volumes of data and generate some statistical results over them. My requirement is to retrieve certain statistics for a particular field. For example, for a given field, I would like to retrieve its unique values and document frequency of each value, along-with the length of the value. The value lengths are indexed along-with each document. So far, I have experimented with Terms Aggregation, with the following query:

{
  "size": 0,
  "query": {
  "match_all": {}
},
 "aggs": {
 "type_count": {
   "terms": {
     "field": "val.keyword",
     "size": 100
   }
  }
 }
}

The query returns all the values in the field val with the number of documents in which each value occurs. I would like the field val_len to be returned as well. Is it possible to achieve this using ElasticSearch? In other words, is it possible to include specific _source fields in buckets? I have looked through the documentation available online, but I haven't found a solution yet. Hoping somebody could point me in the right direction. Thanks in advance!

I tried to include _source in the following manners:

 "aggs": {
    "type_count": {
     "terms": {
        "field": "val.keyword",
        "size": 100        
      },
        "_source":["val_len"]
    }
  }

and

"aggs": {
 "type_count": {
   "terms": {
     "field": "val.keyword",
     "size": 100,
      "_source":["val_len"]
    }     
  }
}

But I guess this isn't the right way, because both gave me parsing errors.

like image 548
Poonam Anthony Avatar asked Feb 12 '19 11:02

Poonam Anthony


People also ask

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

What is Doc_count in elastic search?

_doc_count fieldedit Bucket aggregations always return a field named doc_count showing the number of documents that were aggregated and partitioned in each bucket. Computation of the value of doc_count is very simple. doc_count is incremented by 1 for every document collected in each bucket.

What is sub aggregation in Elasticsearch?

Sub-aggregations allow you to continuously refine and separate groups of criteria of interest, then apply metrics at various levels in the aggregation hierarchy to generate your report.

How can we perform maths calculation done on the documents present in the bucket?

Metric Aggregation. Metric Aggregation mainly refers to the maths calculation done on the documents present in the bucket. For example if you choose a number field the metric calculation you can do on it is COUNT, SUM, MIN, MAX, AVERAGE etc.


1 Answers

You need to use another sub-aggregation called top_hits, like this:

"aggs": {
 "type_count": {
   "terms": {
     "field": "val.keyword",
     "size": 100
    },
    "aggs": {
      "hits": {
        "top_hits": {
          "_source":["val_len"],
          "size": 1
        }
      }
    }
  }
}

Another way of doing it is to use another avg sub-aggregation so you can sort on it, too

"aggs": {
 "type_count": {
   "terms": {
     "field": "val.keyword",
     "size": 100,
     "order": {
       "length": "desc"
     }
    },
    "aggs": {
      "length": {
        "avg": {
          "field": "val_len"
        }
      }
    }
  }
}
like image 171
Val Avatar answered Oct 02 '22 17:10

Val