Elasticsearch

Question

Let's say I have a bunch of documents like this:

{
    "foo" : [1, 2, 3]
}

{
    "foo" : [3, 4, 5]
}

For a query run against these documents, I'm looking for a way to return an array of all values for foo (ideally the unique values, but duplicates are OK):

{
    "foo" : [1, 2, 3, 3, 4, 5]
}

I've looked into the aggregations APIs but I can't see how to achieve this, if its at all possible. I could of course compile the results manually in code, however I could have thousands of documents and it would be far cleaner to obtain the result in this manner.

Ocaso Protal · Accepted Answer

You can use Scripted Metric Aggregation with a reduce_script.

Setup some test data:

curl -XPUT http://localhost:9200/testing/foo/1 -d '{ "foo" : [1, 2, 3] }'
curl -XPUT http://localhost:9200/testing/foo/2 -d '{ "foo" : [4, 5, 6] }'

Now try this aggregation:

curl -XGET "http://localhost:9200/testing/foo/_search" -d'
{
  "size": 0,
  "aggs": {
    "fooreduced": {
      "scripted_metric": {
        "init_script": "_agg[\"result\"] = []",
        "map_script":  "_agg.result.add(doc[\"foo\"].values)",
        "reduce_script": "reduced = []; for (a in _aggs) { for (entry in a) { word = entry.key; reduced += entry.value } }; return reduced.flatten().sort()"

      }
    }
  }
}'

The call will return this:

{
  "took": 50,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "fooreduced": {
      "value": [
        1,
        2,
        3,
        4,
        5,
        6
      ]
    }
  }
}

It might be possible that there is a solution withoun .flatten(), but I'm not that much into groovy (yet) to find such a solution. And I can't say how good the performance of this aggregation is, you have to test it for yourself.

Elasticsearch - combine fields from multiple documents

Tags:

arrays

merge

aggregate

Graham

1 Answers

Ocaso Protal

Recent Activity

Donate For Us