I'm attempting to find related tags to the one currently being viewed. Every document in our index is tagged. Each tag is formed of two parts - an ID and text name:
{ ... meta: { ... tags: [ { id: 123, name: 'Biscuits' }, { id: 456, name: 'Cakes' }, { id: 789, name: 'Breads' } ] } }
To fetch the related tags I am simply querying the documents and getting an aggregate of their tags:
{ "query": { "bool": { "must": [ { "match": { "item.meta.tags.id": "123" } }, { ... } ] } }, "aggs": { "baked_goods": { "terms": { "field": "item.meta.tags.id", "min_doc_count": 2 } } } }
This works perfectly, I am getting the results I want. However, I require both the tag ID and name to do anything useful. I have explored how to accomplish this, the solutions seem to be:
Option one and two are are not available to me so I have been going with 3 but it's not responding in an expected manner. Given the following query (still searching for documents also tagged with 'Biscuits'):
{ ... "aggs": { "baked_goods": { "terms": { "field": "item.meta.tags.id", "min_doc_count": 2 }, "aggs": { "name": { "terms": { "field": "item.meta.tags.name" } } } } } }
I will get this result:
{ ... "aggregations": { "baked_goods": { "buckets": [ { "key": "456", "doc_count": 11, "name": { "buckets": [ { "key": "Biscuits", "doc_count": 11 }, { "key": "Cakes", "doc_count": 11 } ] } } ] } } }
The nested aggregation includes both the search term and the tag I'm after (returned in alphabetical order).
I have tried to mitigate this by adding an exclude
to the nested aggregation but this slowed the query down far too much (around 100 times for 500000 docs). So far the fastest solution is to de-dupe the result manually.
What is the best way to get an aggregation of tags with both the tag ID and tag name in the response?
Thanks for making it this far!
Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.
The sub-aggregations will be computed for the buckets which their parent aggregation generates. There is no hard limit on the level/depth of nested aggregations (one can nest an aggregation under a "parent" aggregation, which is itself a sub-aggregation of another higher-level aggregation).
Elasticsearch is a powerful search engine that can be used to get distinct values. To get started, you need to create an index and specify the mapping for the fields you want to search. Then, you can use the search API to query the index and get the distinct values for the fields you want.
By the looks of it, your tags
is not nested
. For this aggregation to work, you need it nested
so that there is an association between an id
and a name
. Without nested
the list of id
s is just an array and the list of name
s is another array:
"item": { "properties": { "meta": { "properties": { "tags": { "type": "nested", <-- nested field "include_in_parent": true, <-- to, also, keep the flat array-like structure "properties": { "id": { "type": "integer" }, "name": { "type": "string" } } } } } } }
Also, note that I've added to the mapping this line "include_in_parent": true
which means that your nested
tags will, also, behave like a "flat" array-like structure.
So, everything you had so far in your queries will still work without any changes to the queries.
But, for this particular query of yours, the aggregation needs to change to something like this:
{ "aggs": { "baked_goods": { "nested": { "path": "item.meta.tags" }, "aggs": { "name": { "terms": { "field": "item.meta.tags.id" }, "aggs": { "name": { "terms": { "field": "item.meta.tags.name" } } } } } } } }
And the result is like this:
"aggregations": { "baked_goods": { "doc_count": 9, "name": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": 123, "doc_count": 3, "name": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "biscuits", "doc_count": 3 } ] } }, { "key": 456, "doc_count": 2, "name": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "cakes", "doc_count": 2 } ] } }, .....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With