Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch aggregation on array items

Given below is my Elasticsearch document on which I want to fire an aggregation query.

{
  "id": 1,
  "attributes": [
    {
      "fieldId": 1,
      "value": "Male"
    },
    {
      "fieldId": 2,
      "value": "12/11/2015"
    }
  ]
}
{
  "id": 2,
  "attributes": [
    {
      "fieldId": 1,
      "value": "Male"
    },
    {
      "fieldId": 2,
      "value": "11/11/2015"
    }
  ]
}

The result has to be as follows.

[
  {
    "key": "Male",
    "doc_count": 1
  }
]
[
  {
    "key": "12/11/2015",
    "doc_count": 1
  },
  {
    "key": "11/11/2015",
    "doc_count": 1
  }
]

Is there a way that this can be achieved in Elasticsearch?

like image 978
hendrixchord Avatar asked Jan 06 '23 19:01

hendrixchord


1 Answers

That's possible. See this example:

We have to map attributes as nested type to be able to aggregate properly.

PUT /test
{
  "mappings": {
    "sample": {
      "properties": {
        "id": {
          "type": "integer"
        },
        "attributes": {
          "type": "nested",
          "properties": {
            "fieldId": {
              "type": "integer"
            },
            "value": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}

Let's add your given test data:

PUT /test/sample/1
{"id":1,"attributes":[{"fieldId":1,"value":"Male"},{"fieldId":2,"value":"12/11/2015"}]}
PUT /test/sample/2
{"id":2,"attributes":[{"fieldId":1,"value":"Male"},{"fieldId":2,"value":"11/11/2015"}]}

And finally let's run this query:

GET /test/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "Nest": {
      "nested": {
        "path": "attributes"
      },
      "aggs": {
        "fieldIds": {
          "terms": {
            "field": "attributes.fieldId",
            "size": 0
          },
          "aggs": {
            "values": {
              "terms": {
                "field": "attributes.value",
                "size": 0
              }
            }
          }
        }
      }
    }
  }
}

What will it do?

  1. Run nested aggregation first in order to get into nested objects and aggregate them properly.
  2. Create buckets using terms aggregation for each fieldId, in your case we'll get two of them: 1 and 2.
  3. Run terms aggregation again for each of buckets above in order to get coresponding values.

So that's the output.

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "Nest": {
      "doc_count": 4,
      "fieldIds": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": 1,
            "doc_count": 2,
            "values": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "Male",
                  "doc_count": 2
                }
              ]
            }
          },
          {
            "key": 2,
            "doc_count": 2,
            "values": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "11/11/2015",
                  "doc_count": 1
                },
                {
                  "key": "12/11/2015",
                  "doc_count": 1
                }
              ]
            }
          }
        ]
      }
    }
  }
}

It's not precisely what you've requested. But that's closest what you can get in Elasticsearch.

like image 50
Evaldas Buinauskas Avatar answered Jan 12 '23 13:01

Evaldas Buinauskas