Conditional aggregation on multi-field in Elasticsearch

Tags:

elasticsearch

Here's an example of a document in my ES index:

{ 
    "concepts": [ 
        { 
            "type": "location",
            "entities": [ 
                { "text": "Raleigh" }, 
                { "text": "Damascus" }, 
                { "text": "Brussels" } 
            ] 
        }, 
        { 
            "type": "person", 
            "entities": [ 
                { "text": "Johnny Cash" }, 
                { "text": "Barack Obama" }, 
                { "text": "Vladimir Putin" }, 
                { "text": "John Hancock" } 
            ] 
        }, 
        { 
            "type": "organization", 
            "entities": [ 
                { "text": "WTO" }, 
                { "text": "IMF" }, 
                { "text": "United States of America" } 
            ] 
        } 
    ] 
}

I'm trying to aggregate and count the frequency of each concept entity in my set of documents for a specific concept type. Let's say I'm only interested in aggregating concept entities of type "location". My aggregation buckets are then going to be "concepts.entities.text", but I only want to aggregate them if "concepts.type" is equal to "location". Here's my attempt:

{
    "query": {
        // Whatever query
    },
    "aggs": {
        "location_concept_type": {
            "filter": {
                "term": { "concepts.type": "location" }
            },
            "aggs": {
                "entities": {
                    "terms": { "field": "concepts.hits.text" }
                }
            }
        }
    }
}

The problem with this is that it will filter out of the aggregation the documents that do not have any concept entities of type "location". But for the documents who do have concept entities of type "location" and something else, it will bucket all the concept entities, regardless of the concept type.

I have also tried by restructuring my doc in the following way:

{ 
    "concepts": [ 
        { 
            "type": "location",
            "text": "Raleigh"
        },
        { 
            "type": "location",
            "text": "Damascus"
        },
        { 
            "type": "location",
            "text": "Brussels"
        }, 
        { 
            "type": "person",
            "text": "Johnny Cash"
        },
        { 
            "type": "person",
            "text": "Barack Obama"
        }
        { 
            "type": "person",
            "text": "Vladimir Putin"
        }
        { 
            "type": "person",
            "text": "John Hancock"
        }, 
        { 
            "type": "organization",
            "text": "WTO" 
        },
        { 
            "type": "organization",
            "text": "IMF" 
        },
        { 
            "type": "organization",
            "text": "United States of America" 
        }
    ] 
}

But that doesn't work either. Finally I cannot use the concept type as the key (which would solve my problem, I believe), because I also need to be able to aggregate across all concept types (and there potentially is an indefinite and changing number of concept types).

Any idea of how to proceed? Thanks in advance for your help.

548

asked Jul 10 '14 20:07

cwarny

1 Answers

I found a workaround that is kind of a hack. I'll put it as an answer but please feel free to add an alternative more elegant answer. What I did is to add a property alongside "type" and "text", let's call it "text_exp", that combines type and text as follows:

{
    "concepts": [
        { "type": "location", "text": "Raleigh", "text_exp": "location~Raleigh" },
        //...
    ]
}

Then I use a regex in the terms aggregation, as follows. Let's say I only want to aggregate entities of type "location":

{
    "query": {
        // Whatever query
    },
    "aggs": {
        "location_entities": {
            "terms": { 
                "field": "concepts.text_exp",
                "include": "location~.*"
            }
        }
    }
}

Then in the response I just split on "~" and take the right part.

128

answered Oct 06 '22 15:10

cwarny

Related questions
                            
                                What's the best Kibana multi tenancy free open source project?
                            
                                How to add pre-existing data from DynamoDB to Elasticsearch?
                            
                                ElasticSearch - get all available filters (aggregate) from index
                            
                                failed to send join request to master elastic search 5.4 cluster
                            
                                Implementing Array.Except(Array2) > 0 query in elasticsearch filter?
                            
                                Letting only one elasticsearch pod come up on a node in Kubernetes
                            
                                Query to see if a field contains a string using Query DSL
                            
                                Use template to define sub-chart values with Helm
                            
                                Amazon Neptune Full Text Search - specify fields
                            
                                CloudWatch resource access policy error while creating Amazon Elasticsearch Service via Cloud Formation
                            
                                elasticsearch vs solr regarding data structure/query features
                            
                                Trouble with facet counts
                            
                                ElasticSearch incorrectly indexing and querying on non-alphanumeric characters
                            
                                find substring with special chars in Elastic Search
                            
                                ElasticSearch: EdgeNgrams and Numbers
                            
                                elasticsearch search phase execution
                            
                                expose elasticsearch service directly to the client or put it behind a middleware
                            
                                Access array in script_score
                            
                                ElasticSearch Join Filter: Using subquery results as filter input possible?
                            
                                How to sort on analyzed/tokenized field in Elasticsearch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Conditional aggregation on multi-field in Elasticsearch

Tags:

elasticsearch

cwarny

People also ask

1 Answers

cwarny

Recent Activity

Donate For Us