Elasticsearch query with nested aggregations causing out of memory

Tags:

elasticsearch

I have Elasticsearch installed with 16gb of memory. I started using aggregations, but ran into a "java.lang.OutOfMemoryError: Java heap space" error when I attempted to issue the following query:

POST /test-index-syslog3/type-syslog/_search
{
    "query": {
        "query_string": {
           "default_field": "DstCountry",
           "query": "CN"
        }
    },
    "aggs": {
        "whatever": {
            "terms": {
                "field" : "SrcIP"
            },
            "aggs": {
                "destination_ip": {
                    "terms": {
                        "field" : "DstIP"
                    },
                    "aggs": {
                        "port" : {
                            "terms": {
                                "field" : "DstPort"
                            }
                        }
                    }
                }
            }
        }
    }
}

The query_string itself only returns 1266 hits so I'm a bit confused by the OOM error.

Am I using aggregations incorrectly? If not, what can I do to troubleshoot this issue? Thanks!

431

asked Mar 07 '14 16:03

2 Answers

You are loading the entire SrcIP-, DstIP-, and DstPort-fields into memory in order to aggregate on them. This is because Elasticsearch un-inverts the entire field to be able to rapidly look up a document's value for a field given its ID.

If you're going to largely be aggregating on a very small set of data, you should look into using docvalues. Then a document's value is stored in a way that makes it easy to look up given the document's ID. There's a bit more overhead to it, but that way you'll leave it to the operating system's field cache to have the relevant pages in memory, instead of having to load the entire field.

131

answered Sep 16 '22 15:09

Alex Brasetvik

Not sure about the mapping of course, but looking at the value the field DstCountry can be non_analyzed. Than you could replace the query by a filter within the aggregate. Maybe that helps.

Also check if the fields you use in your aggregation are of type non_analyzed.

answered Sep 19 '22 15:09

Jettro Coenradie

Related questions
                            
                                How to delete older logs in ELK to give each application a certain disk quota
                            
                                Elasticsearch Nest - Querying Aliases
                            
                                elasticsearch aggregation group by null key
                            
                                Elasticsearch inner hits in java api
                            
                                Exclude from CamelCase tokenizer in Elasticsearch
                            
                                Use filebeat or logstash to open gzip files
                            
                                How to write data in Elasticsearch from Pyspark?
                            
                                Django-Haystack using Amazon Elasticsearch hosting with IAM credentials
                            
                                serialize query from Nest client elastic search 2.3
                            
                                Scoring documents by both textual match and distance to a point
                            
                                Delete a document with a forward-slash in id from Elasticsearch
                            
                                Check if Elasticsearch has finished indexing
                            
                                Exact-match, case-insensitive match without normalization in Elasticsearch 6.2
                            
                                Is it better to store nested data or use flat structure with unique names in JSON?
                            
                                Difference between Weight and boost in Elasticsearch
                            
                                Elasticsearch - Want to sort by field in all indices where that particular field available or not if not then avoid it
                            
                                Elastic search Query terms and scoring
                            
                                logstash file input configuration
                            
                                Elasticsearch strange behaviour for queries straight after insertion
                            
                                Determining which words were matched in a fuzzy search

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With