Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark num-executors

I have setup a 10 node HDP platform on AWS. Below is my configuration 2 Servers - Name Node and Standby Name node 7 Data Nodes and each node has 40 vCPU and 160 GB of memory.

I am trying to calculate the number of executors while submitting spark applications and after going through different blogs I am confused on what this parameter actually means.

Looking at the below blog it seems the num executors are the total number of executors across all nodes http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

But looking at the below blog it seems that the num executors is per node or server https://blogs.aws.amazon.com/bigdata/post/Tx578UTQUV7LRP/Submitting-User-Applications-with-spark-submit

Can anyone please clarify and review the below :-

  1. Is the num-executors value is per node or the total number of executors across all the data nodes.

  2. I am using the below calculation to come up with the core count, executor count and memory per executor

    Number of cores <= 5 (assuming 5) Num executors = (40-1)/5 = 7 Memory = (160-1)/7 = 22 GB

With the above calculation which would be the correct way

--master yarn-client --driver-memory 10G --executor-memory 22G --num-executors 7 --executor-cores 5 

OR

--master yarn-client --driver-memory 10G --executor-memory 22G --num-executors 49 --executor-cores 5 

Thanks, Jayadeep

like image 323
Jayadeep Jayaraman Avatar asked Sep 13 '16 11:09

Jayadeep Jayaraman


People also ask

How do you determine the number of executors in a Spark?

Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.

How many executors can a Spark have?

One executor is created on each node allocated with Slurm when using Spark in the standalone mode (so that 5 executors would be created in the above example).

How do you check the number of executors in Pyspark?

You can check /applications/[app-id]/executors , which returns A list of all active executors for the given application.

How would you set the number of executors in any Spark based application?

The number of executors for a spark application can be specified inside the SparkConf or from the "spark-submit" by using -–num-executors. Cluster Manager: It schedules and divides resources in the host machine which forms the cluster. The prime work of the cluster manager is to divide resources across applications.


1 Answers

Can anyone please clarify and review the below :-

  1. Is the num-executors value is per node or the total number of executors across all the data nodes.

You need to first understand that the executors run on the NodeManagers (You can think of this like workers in Spark standalone). A number of Containers (includes vCPU, memory, network, disk, etc.) equal to number of executors specified will be allocated for your Spark application on YARN. Now these executor containers will be run on multiple NodeManagers and that depends on the CapacityScheduler (default scheduler in HDP).

So to sum up, total number of executors is the number of resource containers you specify for your application to run.

Refer this blog to understand better.

  1. I am using the below calculation to come up with the core count, executor count and memory per executor

Number of cores <= 5 (assuming 5) Num executors = (40-1)/5 = 7 Memory = (160-1)/7 = 22 GB

There is no rigid formula for calculating the number of executors. Instead you can try enabling Dynamic Allocation in YARN for your application.

like image 121
Rakesh Rakshit Avatar answered Sep 21 '22 17:09

Rakesh Rakshit