Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to aggregate over dynamic fields in elasticsearch?

I am trying to aggregate over dynamic fields (different for different documents) via elasticsearch. Documents are like following:

[{
   "name": "galaxy note",
   "price": 123,
   "attributes": {
      "type": "phone",
      "weight": "140gm"
   }
},{
   "name": "shirt",
   "price": 123,
   "attributes": {
      "type": "clothing",
      "size": "m"
   }
}]

As you can see attributes change across documents. What Im trying to achieve is to aggregate fields of these attributes, like so:

{
     aggregations: {
         types: {
             buckets: [{key: 'phone', count: 123}, {key: 'clothing', count: 12}]
         }
     }
}

I am trying aggregation feature of elasticsearch to achieve this, but not able to find correct way. Is it possible to achieve via aggregation ? Or should I start looking in to facets, thought it seem to be depricated.

like image 721
Varun Oberoi Avatar asked Jul 17 '14 10:07

Varun Oberoi


People also ask

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

How do I turn off dynamic mapping in Elasticsearch?

You can disable dynamic mapping, both at the document and at the object level. Setting the dynamic parameter to false ignores new fields, and strict rejects the document if Elasticsearch encounters an unknown field. Use the update mapping API to update the dynamic setting on existing fields.

What is aggregation query Elasticsearch?

Elasticsearch organizes aggregations into three categories: Metric aggregations that calculate metrics, such as a sum or average, from field values. Bucket aggregations that group documents into buckets, also called bins, based on field values, ranges, or other criteria.


1 Answers

You have to define attributes as nested in your mapping and change the layout of the single attribute values to the fixed layout { key: DynamicKey, value: DynamicValue }

PUT /catalog
{
  "settings" : {
    "number_of_shards" : 1
  },
  "mappings" : {
    "article": {
      "properties": {
        "name": { 
          "type" : "string", 
          "index" : "not_analyzed" 
        },
        "price": { 
          "type" : "integer" 
        },
        "attributes": {
          "type": "nested",
          "properties": {
            "key": {
              "type": "string"
            },
            "value": {
              "type": "string"
            }
          }
        }
      }  
    }
  }
}

You may than index your articles like this

POST /catalog/article
{
  "name": "shirt",
  "price": 123,
  "attributes": [
    { "key": "type", "value": "clothing"},
    { "key": "size", "value": "m"}
  ]
}

POST /catalog/article
{
  "name": "galaxy note",
  "price": 123,
  "attributes": [
    { "key": "type", "value": "phone"},
    { "key": "weight", "value": "140gm"}
  ]
}

After all you are then able to aggregate over the nested attributes

GET /catalog/_search
{
  "query":{
    "match_all":{}
  },     
  "aggs": {
    "attributes": {
      "nested": {
        "path": "attributes"
      },
      "aggs": {
        "key": {
          "terms": {
            "field": "attributes.key"
          },
          "aggs": {
            "value": {
              "terms": {
                "field": "attributes.value"
              }
            }
          }
        }
      }
    }
  }
}

Which then gives you the information you requested in a slightly different form

[...]
"buckets": [
  {
    "key": "type",
    "doc_count": 2,
    "value": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
      {
        "key": "clothing",
        "doc_count": 1
      }, {
        "key": "phone",
        "doc_count": 1
      }
      ]
    }
  },
[...]
like image 121
Georg Engel Avatar answered Sep 21 '22 18:09

Georg Engel