Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to max out CPU cores on Elasticsearch cluster

How many shards and replicas do I have to set to use every cpu core (I want 100% load, fastest query results) in my cluster?

I want to use Elasticsearch for aggregations. I read that Elasticsearch uses multiple cpu cores, but found no exact details about cpu cores regarding sharding and replicas.

My observations are, that a single shard does not use more than 1 core/thread at query time (considerung there is only one query at a time). With replicas, the query of a 1-shard index are not faster, since Elasticsearch does not seem to use the other nodes to distribute the load on a shard.

My questions (one query at a time):

  • A shard does not use more than one cpu core?
  • Shards must always be scanned completely, replicas cannot be used to divide intra-shard load between nodes?
  • The formular for best performance is SUM(CPU_CORES per node) * PRIMARY_SHARDS?
like image 839
static-max Avatar asked Nov 09 '15 14:11

static-max


People also ask

Is Elasticsearch memory or CPU intensive?

The Elasticsearch process is very memory intensive. Elasticsearch uses a JVM (Java Virtual Machine), and close to 50% of the memory available on a node should be allocated to JVM.

How do I scale a Elasticsearch cluster?

To scale out Elasticsearch, you provision extra storage, update the deployment configuration, and then perform a helm upgrade. You can verify that the scale out was successful by checking cluster status and health, as well as node and shard health. To avoid data loss, you scale in Elasticsearch one node at a time.


1 Answers

When doing an operation (indexing, searching, bulk indexing etc) a shard on a node uses one thread of execution, meaning one CPU core.

If you have one query running at a given moment, that will use one CPU core per shard. For example, a three node cluster with a single index that has 6 primary shards and one replica, will have in total 12 shards, 4 shards on each node.

If there is only one query running on the cluster, for that index, ES will query all the 6 shards of the index (no matter if they are primaries or replicas) and each node will use between 0 and 4 CPU cores for the job, because the round-robin algorithm used by ES to choose which copy of a shard performs the search can choose no shards on one node or maximum 4 shards on one node.

like image 92
Andrei Stefan Avatar answered Sep 17 '22 12:09

Andrei Stefan