Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Standalone Number Executors/Cores Control

So I have a spark standalone server with 16 cores and 64GB of RAM. I have both the master and worker running on the server. I don't have dynamic allocation enabled. I am on Spark 2.0

What I dont understand is when I submit my job and specify:

--num-executors 2
--executor-cores 2 

Only 4 cores should be taken up. Yet when the job is submitted, it takes all 16 cores and spins up 8 executors regardless, bypassing the num-executors parameter. But if I change the executor-cores parameter to 4 it will adjust accordingly and 4 executors will spin up.

like image 941
theMadKing Avatar asked Sep 08 '16 19:09

theMadKing


People also ask

How many executors should I use Spark?

The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.

How do you determine the number of executors and cores in Spark?

According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.

How do I set executor cores in Spark?

Every Spark executor in an application has the same fixed number of cores and same fixed heap size. The number of cores can be specified with the --executor-cores flag when invoking spark-submit, spark-shell, and pyspark from the command line, or by setting the spark. executor. cores property in the spark-defaults.

Which command specifies the number of executor cores for a Spark standalone cluster per executor process?

You can also pass an option --total-executor-cores <numCores> to control the number of cores that spark-shell uses on the cluster.


1 Answers

Disclaimer: I really don't know if --num-executors should work or not in standalone mode. I haven't seen it used outside YARN.

Note: As pointed out by Marco --num-executors is no longer in use on YARN.

You can effectively control number of executors in standalone mode with static allocation (this works on Mesos as well) by combining spark.cores.max and spark.executor.cores where number of executors is determined as:

floor(spark.cores.max / spark.executor.cores)

For example:

--conf "spark.cores.max=4" --conf "spark.executor.cores=2"
like image 72
zero323 Avatar answered Sep 28 '22 02:09

zero323