how to decide the memory requirement for my elasticsearch server

Tags:

I have a scenario here,

The Elasticsearch DB with about 1.4 TB of data having,

 _shards": {
     "total": 202,
     "successful": 101,
     "failed": 0
}

Each index size is approximately between, 3 GB to 30 GB and in near future, it is expected to have 30GB file size on a daily basis.

OS information:

 NAME="Red Hat Enterprise Linux Server"
 VERSION="7.2 (Maipo)"
 ID="rhel"
 ID_LIKE="fedora"
 VERSION_ID="7.2"
 PRETTY_NAME="Red Hat Enterprise Linux Server 7.2 (Maipo)"

The system has 32 GB of RAM and the filesystem is 2TB (1.4TB Utilised). I have configured a maximum of 15 GB for Elasticsearch server. But this is not enough for me to query this DB. The server hangs for a single query hit on server.

I will be including 1TB on the filesystem in this server so that the total available filesystem size will be 3TB. also I am planning to increase the memory to 128GB which is an approximate estimation.

Could someone help me calculate how to determine the minimum RAM required for a server to respond at least 50 requests simultaneously?

It would be greatly appreciated if you can suggest any tool/ formula to analyze this requirement. also it will be helpful if you can give me any other scenario with numbers so that I can use that to determine my resource need.

839

asked May 17 '17 21:05

siva

2 Answers

If you're here for a rule of thumb, I'd say that on modern ES and Java, 10-20GB of heap per TB of data (I'm thinking of the typical ELK use-case) should be enough. Multiplying by 2, that's 20-40GB of total RAM per TB.

Now for the datailed answer :) There are two types of memory that are relevant here:

JVM heap
OS cache (the OS will use free memory to cache index files)

OS cache is down to your IO requirements (queries do lots of small random IO). If you have a query-intensive use-case (e.g. E-commerce), you'll want to fit your whole index in the OS cache (or at least most of it). For logs and other time-series data, you typically have more expensive, rarer queries. There, if you have a local SSD you can make do with only a fraction of your data in the cache. I've seen servers with 4TB of disk space on 32GB of OS cache.

JVM heap can also be divided in two:

static memory, required even when the server is idle
transient memory, required by ongoing indexing/search operations

You'd see most of the static memory if you hit the _nodes/stats endpoint. It's best if you have these plotted in your Elasticsearch monitoring tool. You'll see it as segments_memory and various caches. For recent versions of Elasticsearch (e.g. 7.7 or higher), there's not a lot of memory like this - at least for most use-cases. I've seen ELK deployments with multiple TB of data definitely using less than 10GB of RAM for static memory. That said, you may reduce it by not storing info that you don't need. For example by not indexing fields you don't search on.

Transient memory will mainly depend on your queries: how often they run and how expensive they are. One-off expensive queries tend to be more dangerous, so avoid using too many levels of aggregations, massive size values, or queries that expand to too many terms (wildcards, fuzzy...). To accommodate those, you simply need heap. How much? It's really a matter of monitor-and-adjust.

Side-note: I don't agree with the general suggestion that you should stay under 32GB at all costs. With Java 11+ and G1GC, I've seen deployments with over 100GB of heap that run just fine. The overhead of uncompressed oops is not 10-20GB at every 30GB, like the docs suggest - that's an extrapolation of a worse-case scenario. In my experience, it's more like a few GB every 30GB - something like 10% for many deployments. This doesn't mean you have to use 100GB of heap, it's just that if you need a lot of heap in your cluster, you don't have to have hundreds of nodes (you can have fewer bigger ones).

Speaking of GC, it may fall behind if you run many queries that aren't terribly expensive. And then you'd run out of heap, even if you have plenty. Monitoring should tell you this, as a full GC will eventually clean up the heap with a big pause (read: cluster instability). Here, Java 11 with G1GC and a low -XX:GCTimeRatio (e.g. 3) should fix the issue.

198

answered Sep 22 '22 12:09

Radu Gheorghe

You will need to scale using several nodes to stay efficient. Elasticsearch has its per-node memory sweet spot at 64GB with 32GB reserved for ES.

https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html#_memory for more details. The book is a very good read if you are using Elasticsearch for serious stuff

answered Sep 24 '22 12:09

Christian Hubinger

Related questions
                            
                                Mixing bool and multi match/function score query
                            
                                remove objects from array elastic search
                            
                                Elasticsearch Bulk API - Unexpected end-of-input: expected close marker for ARRAY
                            
                                Elasticsearch-PHP needs curl or custom http handler
                            
                                ElasticSearch: post_filter or filter?
                            
                                How do I combine multiple queries in ElasticSearch
                            
                                Serilog + serilog-sinks-elasticsearch +ElasticSearch Auth
                            
                                logstash tab separator not escaping
                            
                                ElasticSearch vs Relational Database
                            
                                Elastic Search combination of Range and Term filters
                            
                                Return all records in one query in Elasticsearch
                            
                                Elasticsearch show all results using scroll in node js
                            
                                Making a signed HTTP request to AWS Elasticsearch in Python
                            
                                AWS - subscribe multiple lambda logs to one elasticsearch service
                            
                                AJAX Call with Elasticsearch Search
                            
                                Signing ElasticSearch AWS calls
                            
                                Elastic Search Interaction of Highlights with Synonym Filter
                            
                                Need to know how to search in ES using c# searching in arrays
                            
                                How is Elastic Search sorting when no sort option specified and no search query specified
                            
                                Scoring by term position in ElasticSearch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to decide the memory requirement for my elasticsearch server

Tags:

memory

filesystems

elasticsearch

redhat

siva

People also ask

2 Answers

Radu Gheorghe

Christian Hubinger

Recent Activity

Donate For Us