Terms Aggregation for nested field in Elastic Search

Question

I have next mapping for field in Elastic Search (definition in YML):

              my_analyzer:
                  type: custom
                  tokenizer:  keyword
                  filter: lowercase

               products_filter:
                    type: "nested"
                    properties:
                        filter_name: {"type" : "string", analyzer: "my_analyzer"}
                        filter_value: {"type" : "string" , analyzer: "my_analyzer"}

Each document has a lot of filters and it looks like:

"products_filter": [
{
"filter_name": "Rahmengröße",
"filter_value": "33,5 cm"
}
,
{
"filter_name": "color",
"filter_value": "gelb"
}
,
{
"filter_name": "Rahmengröße",
"filter_value": "39,5 cm"
}
,
{
"filter_name": "Rahmengröße",
"filter_value": "45,5 cm"
}]

I trying to get a list of unique filter names and list of unique filter values for each filter.

I mean, I want to get structure like: Rahmengröße:
39,5 cm
45,5 cm
33,5 cm
Color:
gelb

To get it I tried few variants of aggregation, for example:

{
  "aggs": {
    "bla": {
      "terms": {
        "field": "products_filter.filter_name"
      },
      "aggs": {
        "bla2": {
          "terms": {
            "field": "products_filter.filter_value"
          }
        }
      }
    }
  }
}

And this request is wrong.

It will return me list of unique filter names, and each will contain list of ALL filter_values.

"bla": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 103,
"buckets": [
{
"key": "color",
"doc_count": 9,
"bla2": {
"doc_count_error_upper_bound": 4,
"sum_other_doc_count": 366,
"buckets": [
{
"key": "100",
"doc_count": 5
}
,
{
"key": "cm",
"doc_count": 5
}
,
{
"key": "unisex",
"doc_count": 5
}
,
{
"key": "11",
"doc_count": 4
}
,
{
"key": "160",
"doc_count": 4
}
,
{
"key": "22",
"doc_count": 4
}
,
{
"key": "a",
"doc_count": 4
}
,
{
"key": "alu",
"doc_count": 4
}
,
{
"key": "aluminium",
"doc_count": 4
}
,
{
"key": "aus",
"doc_count": 4
}
]
}
}
,

Additionally I tried to use Reverse nested aggregation, but it doesnt help me.

So I think there some logical fault in my attempts?

Evaldas Buinauskas · Accepted Answer

So as I've said. Your issue is that your text is analyzed and elasticsearch always aggregates at token level. So in order to fix that, your field values have to be indexed as single tokens. There are two options:

not to analyze them
index them using keyword analyzer + lowercase (case insensitive aggs)

So that would be settings to create custom keyword analyzer with lowercase filter and removed accent characters (ö => o and ß => ss and additional fields for your fields, so they can be used for aggregation (raw and keyword):

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer_keyword": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "asciifolding",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "data": {
      "properties": {
        "products_filter": {
          "type": "nested",
          "properties": {
            "filter_name": {
              "type": "string",
              "analyzer": "standard",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "keyword": {
                  "type": "string",
                  "analyzer": "my_analyzer_keyword"
                }
              }
            },
            "filter_value": {
              "type": "string",
              "analyzer": "standard",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                },
                "keyword": {
                  "type": "string",
                  "analyzer": "my_analyzer_keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

A test document, you've given us:

PUT /test/data/1
{
  "products_filter": [
    {
      "filter_name": "Rahmengröße",
      "filter_value": "33,5 cm"
    },
    {
      "filter_name": "color",
      "filter_value": "gelb"
    },
    {
      "filter_name": "Rahmengröße",
      "filter_value": "39,5 cm"
    },
    {
      "filter_name": "Rahmengröße",
      "filter_value": "45,5 cm"
    }
  ]
}

That would be query to aggregate using raw field:

GET /test/_search
{
  "size": 0,
  "aggs": {
    "Nesting": {
      "nested": {
        "path": "products_filter"
      },
      "aggs": {
        "raw_names": {
          "terms": {
            "field": "products_filter.filter_name.raw",
            "size": 0
          },
          "aggs": {
            "raw_values": {
              "terms": {
                "field": "products_filter.filter_value.raw",
                "size": 0
              }
            }
          }
        }
      }
    }
  }
}

It does bring expected result (buckets with filter names and subbuckets with their values):

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "Nesting": {
      "doc_count": 4,
      "raw_names": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "Rahmengröße",
            "doc_count": 3,
            "raw_values": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "33,5 cm",
                  "doc_count": 1
                },
                {
                  "key": "39,5 cm",
                  "doc_count": 1
                },
                {
                  "key": "45,5 cm",
                  "doc_count": 1
                }
              ]
            }
          },
          {
            "key": "color",
            "doc_count": 1,
            "raw_values": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "gelb",
                  "doc_count": 1
                }
              ]
            }
          }
        ]
      }
    }
  }
}

Alternitavely, you could use field with keyword analyzer (and some normalization) to get a bit more generic and case insensitive results:

GET /test/_search
{
  "size": 0,
  "aggs": {
    "Nesting": {
      "nested": {
        "path": "products_filter"
      },
      "aggs": {
        "keyword_names": {
          "terms": {
            "field": "products_filter.filter_name.keyword",
            "size": 0
          },
          "aggs": {
            "keyword_values": {
              "terms": {
                "field": "products_filter.filter_value.keyword",
                "size": 0
              }
            }
          }
        }
      }
    }
  }
}

That's the result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "Nesting": {
      "doc_count": 4,
      "keyword_names": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "rahmengrosse",
            "doc_count": 3,
            "keyword_values": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "33,5 cm",
                  "doc_count": 1
                },
                {
                  "key": "39,5 cm",
                  "doc_count": 1
                },
                {
                  "key": "45,5 cm",
                  "doc_count": 1
                }
              ]
            }
          },
          {
            "key": "color",
            "doc_count": 1,
            "keyword_values": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "gelb",
                  "doc_count": 1
                }
              ]
            }
          }
        ]
      }
    }
  }
}

Terms Aggregation for nested field in Elastic Search

Tags:

aggregate-functions

elasticsearch

Aleksandr

1 Answers

Evaldas Buinauskas

Recent Activity

Donate For Us

Terms Aggregation for nested field in Elastic Search

Tags:

aggregate-functions

elasticsearch

Aleksandr

1 Answers

Evaldas Buinauskas

Related questions

Recent Activity

Donate For Us