Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding additional fields to ElasticSearch terms aggregation

Indexed documents are like:

{   id: 1,    title: 'Blah',   ...   platform: {id: 84, url: 'http://facebook.com', title: 'Facebook'}   ... } 

What I want is count and output stats-by-platform. For counting, I can use terms aggregation with platform.id as a field to count:

aggs: {   platforms: {     terms: {field: 'platform.id'}   } } 

This way I receive stats as a multiple buckets looking like {key: 8, doc_count: 162511}, as expected.

Now, can I somehow add to those buckets also platform.name and platform.url (for pretty output of stats)? The best I've came with looks like:

aggs: {   platforms: {     terms: {field: 'platform.id'},     aggs: {       name: {terms: {field: 'platform.name'}},       url: {terms: {field: 'platform.url'}}     }   } } 

Which, in fact, works, and returns pretty complicated structure in each bucket:

{key: 7,   doc_count: 528568,   url:    {doc_count_error_upper_bound: 0,     sum_other_doc_count: 0,     buckets: [{key: "http://facebook.com", doc_count: 528568}]},   name:    {doc_count_error_upper_bound: 0,     sum_other_doc_count: 0,     buckets: [{key: "Facebook", doc_count: 528568}]}}, 

Of course, name and url of platform could be extracted from this structure (like bucket.url.buckets.first.key), but is there more clean and simple way to do the task?

like image 760
zverok Avatar asked Oct 23 '15 12:10

zverok


People also ask

What is sub aggregation in Elasticsearch?

The sub-aggregations will be computed for the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of another higher-level aggregation).

What is nested aggregation?

Nested aggregationeditA special single bucket aggregation that enables aggregating nested documents. For example, lets say we have an index of products, and each product holds the list of resellers - each having its own price for the product.

What is pipeline aggregation?

An aggregation pipeline consists of one or more stages that process documents: Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values. The documents that are output from a stage are passed to the next stage.


1 Answers

It seems the best way to show intentions is top hits aggregation: "from each aggregated group select only one document", and then extract platform from it:

aggs: {   platforms: {     terms: {field: 'platform.id'},     aggs: {       platform: {top_hits: {size: 1, _source: {include: ['platform']}}}   } } 

This way, each bucked will look like:

{"key": 7,   "doc_count": 529939,   "platform": {     "hits": {       "hits": [{        "_source": {         "platform":            {"id": 7, "name": "Facebook", "url": "http://facebook.com"}         }       }]     }   }, } 

Which is kinda too deeep (as usual with ES), but clean: bucket.platform.hits.hits.first._source.platform

like image 91
zverok Avatar answered Sep 19 '22 23:09

zverok