How could I configure from Java (or Scala) code amount of executors having SparkConfig
and SparkContext
? I see constantly 2 executors. Looks like spark.default.parallelism
does not work and is about something different.
I just need to set amount of executors to be equal to cluster size but there are always only 2 of them. I know my cluster size. I run on YARN if this matters.
According to the recommendations which we discussed above: So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30. Leaving 1 executor for ApplicationManager => --num-executors = 29. Number of executors per node = 30/10 = 3.
The first Spark job starts with two executors (because the minimum number of nodes is set to two in this example). The cluster can autoscale to a maximum of ten executors (because the maximum number of nodes is set to ten).
If a node has good memory it can have 2 or more executors in the same machine.
Broadly set the memory between 8GB and 16GB. This is an arbitrary choice and governed by the above two points. Pack as many executors as can be assigned to one cluster node. Evenly distribute cores to all executors.
You could also do it programmatically by setting the parameters "spark.executor.instances" and "spark.executor.cores" on the SparkConf object.
Example:
SparkConf conf = new SparkConf() // 4 executor per instance of each worker .set("spark.executor.instances", "4") // 5 cores on each executor .set("spark.executor.cores", "5");
The second parameter is only for YARN and standalone mode. It allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With