How many shards and replicas do I have to set to use every cpu core (I want 100% load, fastest query results) in my cluster?
I want to use Elasticsearch for aggregations. I read that Elasticsearch uses multiple cpu cores, but found no exact details about cpu cores regarding sharding and replicas.
My observations are, that a single shard does not use more than 1 core/thread at query time (considerung there is only one query at a time). With replicas, the query of a 1-shard index are not faster, since Elasticsearch does not seem to use the other nodes to distribute the load on a shard.
My questions (one query at a time):
The Elasticsearch process is very memory intensive. Elasticsearch uses a JVM (Java Virtual Machine), and close to 50% of the memory available on a node should be allocated to JVM.
To scale out Elasticsearch, you provision extra storage, update the deployment configuration, and then perform a helm upgrade. You can verify that the scale out was successful by checking cluster status and health, as well as node and shard health. To avoid data loss, you scale in Elasticsearch one node at a time.
When doing an operation (indexing, searching, bulk indexing etc) a shard on a node uses one thread of execution, meaning one CPU core.
If you have one query running at a given moment, that will use one CPU core per shard. For example, a three node cluster with a single index that has 6 primary shards and one replica, will have in total 12 shards, 4 shards on each node.
If there is only one query running on the cluster, for that index, ES will query all the 6 shards of the index (no matter if they are primaries or replicas) and each node will use between 0 and 4 CPU cores for the job, because the round-robin algorithm used by ES to choose which copy of a shard performs the search can choose no shards on one node or maximum 4 shards on one node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With