Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

YARN: What is the difference between number-of-executors and executor-cores in Spark?

I am learning Spark on AWS EMR. In the process I am trying to understand the difference between number of executors(--num-executors) and executor cores (--executor-cores). Can any one please tell me here?

Also when I am trying to submit the following job, I am getting error:

spark-submit --deploy-mode cluster --master yarn --num-executors 1 --executor-cores 5   --executor-memory 1g -–conf spark.yarn.submit.waitAppCompletion=false wordcount.py s3://test/spark-example/input/input.txt s3://test/spark-example/output21

Error: Unrecognized option: -–conf
like image 487
AIR Avatar asked Apr 25 '16 23:04

AIR


People also ask

What is the difference between executors and executor core in Spark?

The cores property controls the number of concurrent tasks an executor can run. - -executor-cores 5 means that each executor can run a maximum of five tasks at the same time.

How many cores does an executor have in Spark?

The consensus in most Spark tuning guides is that 5 cores per executor is the optimum number of cores in terms of parallel processing.

How do you determine the number of executors and cores in Spark?

According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.

What is the default number of executors in Spark?

The maximum number of executors to be used. Its Spark submit option is --max-executors . If it is not set, default is 2.


1 Answers

Number of executors is the number of distinct yarn containers (think processes/JVMs) that will execute your application.

Number of executor-cores is the number of threads you get inside each executor (container).

So the parallelism (number of concurrent threads/tasks running) of your spark application is #executors X #executor-cores. If you have 10 executors and 5 executor-cores you will have (hopefully) 50 tasks running at the same time.

like image 77
marios Avatar answered Oct 31 '22 03:10

marios