Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch - string concat aggregation?

I've got the following simple mapping:

"element": {
  "dynamic": "false",
  "properties": {
    "id": { "type": "string", "index": "not_analyzed" },
    "group": { "type": "string", "index": "not_analyzed" },
    "type": { "type": "string", "index": "not_analyzed" }
  }
} 

Which basically is a way to store Group object:

{
  id : "...",
  elements : [
    {id: "...", type: "..."},
    ...
    {id: "...", type: "..."}
  ] 
}

I want to find how many different groups exist sharing the same set of element types (ordered, including repetitions).

An obvious solution would be to change the schema to:

"element": {
  "dynamic": "false",
  "properties": {
    "group": { "type": "string", "index": "not_analyzed" },
    "concatenated_list_of_types": { "type": "string", "index": "not_analyzed" }
  }
} 

But, due to the requirements, we need to be able to exclude some types from group by (aggregation) :(

All fields of the document are mongo ids, so in SQL I would do something like this:

SELECT COUNT(id), concat_value FROM (
    SELECT GROUP_CONCAT(type_id), group_id 
    FROM table
    WHERE type_id != 'some_filtered_out_type_id' 
    GROUP BY group_id
) T GROUP BY concat_value  

In Elastic with given mapping it's really easy to filter out, its also not a problem to count assuming we have a concated value. Needless to say, sum aggregation does not work for strings.

How can I get this working? :)

Thanks!

like image 559
Alexander Mikhalchenko Avatar asked Sep 23 '16 14:09

Alexander Mikhalchenko


1 Answers

Finally I solved this problem with scripting and by changing the mapping.

{
  "mappings": {
    "group": {
      "dynamic": "false",
      "properties": {
        "id": { "type": "string", "index": "not_analyzed" },
        "elements": { "type": "string", "index": "not_analyzed" }
      }
    }
  }
}

There are still some issues with duplicate elements in array (ScriptDocValues.Strings) for some reason strips out dups, but here's an aggregation that counts by string concat:

{
  "aggs": {
    "path": {
      "scripted_metric": {
        "map_script": "key = doc['elements'].join('-'); _agg[key] = _agg[key] ? _agg[key] + 1 : 1",
        "combine_script": "_agg",
        "reduce_script": "_aggs.collectMany { it.entrySet() }.inject( [:] ) { result, e -> result << [ (e.key):e.value + ( result[ e.key ] ?: 0 ) ]}"
      }
    }
  }
}

The result would be as follows:

  "aggregations" : {
    "path" : {
      "value" : {
        "5639abfb5cba47087e8b457e" : 362,
        "568bfc495cba47fc308b4567" : 3695,
        "5666d9d65cba47701c413c53" : 14,
        "5639abfb5cba47087e8b4571-5639abfb5cba47087e8b457b" : 1,
        "570eb97abe529e83498b473d" : 1
      }
    }
  }
like image 164
Alexander Mikhalchenko Avatar answered Sep 19 '22 12:09

Alexander Mikhalchenko