Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch query with nested aggregations causing out of memory

I have Elasticsearch installed with 16gb of memory. I started using aggregations, but ran into a "java.lang.OutOfMemoryError: Java heap space" error when I attempted to issue the following query:

POST /test-index-syslog3/type-syslog/_search
{
    "query": {
        "query_string": {
           "default_field": "DstCountry",
           "query": "CN"
        }
    },
    "aggs": {
        "whatever": {
            "terms": {
                "field" : "SrcIP"
            },
            "aggs": {
                "destination_ip": {
                    "terms": {
                        "field" : "DstIP"
                    },
                    "aggs": {
                        "port" : {
                            "terms": {
                                "field" : "DstPort"
                            }
                        }
                    }
                }
            }
        }
    }
}

The query_string itself only returns 1266 hits so I'm a bit confused by the OOM error.

Am I using aggregations incorrectly? If not, what can I do to troubleshoot this issue? Thanks!

like image 431
Sgt B Avatar asked Mar 07 '14 16:03

Sgt B


People also ask

What happens if Elasticsearch runs out of memory?

If you don't have enough memory to keep your fielddata resident in memory, Elasticsearch will constantly have to reload data from disk, and evict other data to make space. Evictions cause heavy disk I/O and generate a large amount of garbage in memory, which must be garbage collected later on.

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

Can Kibana perform aggregation across fields that contain nested objects?

But visualizations in Kibana don't aggregate on nested fields like that, regardless of how you set your mappings -- if you want to run aggregations on the data in the items list, you aren't going to get the results you are looking for. Then doing the same sum aggregation should return the expected results.


2 Answers

You are loading the entire SrcIP-, DstIP-, and DstPort-fields into memory in order to aggregate on them. This is because Elasticsearch un-inverts the entire field to be able to rapidly look up a document's value for a field given its ID.

If you're going to largely be aggregating on a very small set of data, you should look into using docvalues. Then a document's value is stored in a way that makes it easy to look up given the document's ID. There's a bit more overhead to it, but that way you'll leave it to the operating system's field cache to have the relevant pages in memory, instead of having to load the entire field.

like image 131
Alex Brasetvik Avatar answered Sep 16 '22 15:09

Alex Brasetvik


Not sure about the mapping of course, but looking at the value the field DstCountry can be non_analyzed. Than you could replace the query by a filter within the aggregate. Maybe that helps.

Also check if the fields you use in your aggregation are of type non_analyzed.

like image 26
Jettro Coenradie Avatar answered Sep 19 '22 15:09

Jettro Coenradie