I have setup a 10 node HDP platform on AWS. Below is my configuration 2 Servers - Name Node and Standby Name node 7 Data Nodes and each node has 40 vCPU and 160 GB of memory.
I am trying to calculate the number of executors while submitting spark applications and after going through different blogs I am confused on what this parameter actually means.
Looking at the below blog it seems the num executors are the total number of executors across all nodes http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
But looking at the below blog it seems that the num executors is per node or server https://blogs.aws.amazon.com/bigdata/post/Tx578UTQUV7LRP/Submitting-User-Applications-with-spark-submit
Can anyone please clarify and review the below :-
Is the num-executors value is per node or the total number of executors across all the data nodes.
I am using the below calculation to come up with the core count, executor count and memory per executor
Number of cores <= 5 (assuming 5) Num executors = (40-1)/5 = 7 Memory = (160-1)/7 = 22 GB
With the above calculation which would be the correct way
--master yarn-client --driver-memory 10G --executor-memory 22G --num-executors 7 --executor-cores 5
OR
--master yarn-client --driver-memory 10G --executor-memory 22G --num-executors 49 --executor-cores 5
Thanks, Jayadeep
Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.
One executor is created on each node allocated with Slurm when using Spark in the standalone mode (so that 5 executors would be created in the above example).
You can check /applications/[app-id]/executors , which returns A list of all active executors for the given application.
The number of executors for a spark application can be specified inside the SparkConf or from the "spark-submit" by using -–num-executors. Cluster Manager: It schedules and divides resources in the host machine which forms the cluster. The prime work of the cluster manager is to divide resources across applications.
Can anyone please clarify and review the below :-
- Is the num-executors value is per node or the total number of executors across all the data nodes.
You need to first understand that the executors run on the NodeManagers (You can think of this like workers in Spark standalone). A number of Containers (includes vCPU, memory, network, disk, etc.) equal to number of executors specified will be allocated for your Spark application on YARN. Now these executor containers will be run on multiple NodeManagers and that depends on the CapacityScheduler (default scheduler in HDP).
So to sum up, total number of executors is the number of resource containers you specify for your application to run.
Refer this blog to understand better.
- I am using the below calculation to come up with the core count, executor count and memory per executor
Number of cores <= 5 (assuming 5) Num executors = (40-1)/5 = 7 Memory = (160-1)/7 = 22 GB
There is no rigid formula for calculating the number of executors. Instead you can try enabling Dynamic Allocation in YARN for your application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With