Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch distinct parent sub aggregation without nested field

In elasticsearch 6.2 I have a parent-child relationship :

Document -> NamedEntity

I want to aggregate NamedEntity by counting mention field and giving the number of documents that contains each named entity.

My use case is :

doc1 contains 'NER'(_id=ner11), 'NER'(_id=ner12)
doc2 contains 'NER'(_id=ner2)

The parent/child relation is implemented with a join field. In the Document I have a field :

join: {
  name: "Document"
}

And in the NamedEntity children :

join: {
  name: "NamedEntity",
  parent: "parent_id"
}

with _routing set to parent_id.

So I tried with terms sub-aggregation :

curl -XPOST elasticsearch:9200/datashare-testjs/_search?pretty -H 'Content-Type: application/json' -d '
{"query":{"term":{"type":"NamedEntity"}},
 "aggs":{
   "mentions":{
     "terms":{
       "field":"mention"
     },
     "aggs":{
       "docs":{
         "terms":{"field":"join"}
       }
     }
   }
 }
}'

And I have the following response :

"aggregations" : {
  "mentions" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
      {
        "key" : "NER",
        "doc_count" : 3,
        "docs" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "NamedEntity",
              "doc_count" : 3 <-- WRONG ! There are 2 distinct documents
            }
          ]
        }
      }
    ]
  }

I find the expected 3 occurrences in mentions.buckets.doc_count. But in the mentions.buckets.docs.buckets.doc_count field I would like to have only 2 documents (not 3). Like a select count distinct.

If I aggregate with "terms":{"field":"join.parent"} I have :

...
"docs" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [ ]
}
...

I tied with cardinality aggregation on the join field and I obtain a value of 1, and cardinality aggregation on the join.parent that returns a value of 0.

So how do you make an aggregation distinct count on parents without the use of a reverse nested aggregation ?


As @AndreiStefan asked, here is the mapping. It is a simple 1-N relation between Document(content) and NamedEntity(mention) in an ES 6 mapping (fields are defined on the same level) :

curl -XPUT elasticsearch:9200/datashare-testjs -H 'Content-Type: application/json' -d '
{
    "mappings": {
    "doc": {
      "properties": {
        "content": {
          "type": "text",
          "index_options": "offsets"
        },
        "type": {
          "type": "keyword"
        },
        "join": {
          "type": "join",
          "relations": {
            "Document": "NamedEntity"
          }
        },
        "mention": {
          "type": "keyword"
        }
      }
    }
}}

And the requests for a minimal dataset :

curl -XPUT elasticsearch:9200/datashare-testjs/doc/doc1 -H 'Content-Type: application/json' -d '{"type": "Document", "join": {"name": "Document"}, "content": "a NER document contains 2 NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/doc2 -H 'Content-Type: application/json' -d '{"type": "Document", "join": {"name": "Document"}, "content": "another NER document"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner11?routing=doc1 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc1"}, "mention": "NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner12?routing=doc1 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc1"}, "mention": "NER"}'
curl -XPUT elasticsearch:9200/datashare-testjs/doc/ner2?routing=doc2 -H 'Content-Type: application/json' -d '{"type": "NamedEntity", "join": {"name": "NamedEntity", "parent": "doc2"}, "mention": "NER"}'
like image 321
Bruno Thomas Avatar asked Nov 30 '25 16:11

Bruno Thomas


1 Answers

  "aggs": {
    "mentions": {
      "terms": {
        "field": "mention"
      },
      "aggs": {
        "docs": {
          "terms": {
            "field": "join"
          },
          "aggs": {
            "uniques": {
              "cardinality": {
                "field": "join#Document"
              }
            }
          }
        }
      }
    }
  }

OR if you just want the count:

  "aggs": {
    "mentions": {
      "terms": {
        "field": "mention"
      },
      "aggs": {
        "uniques": {
          "cardinality": {
            "field": "join#Document"
          }
        }
      }
    }
  }

If you need a custom ordering (by unique counts):

  "aggs": {
    "mentions": {
      "terms": {
        "field": "mention",
        "order": {
          "uniques": "desc"
        }
      },
      "aggs": {
        "uniques": {
          "cardinality": {
            "field": "join#Document"
          }
        }
      }
    }
  }
like image 198
Andrei Stefan Avatar answered Dec 02 '25 07:12

Andrei Stefan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!