Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the relationship between spark executor and yarn container when using spark on yarn

what is the relationship between spark executor and yarn container when using spark on yarn?
For example, when I set executor-memory = 20G and yarn container memory = 10G, does 1 executor contains 2 containers?

like image 422
no123ff Avatar asked Feb 06 '18 03:02

no123ff


People also ask

What is the difference between container and executor?

Spark Executor runs within a Yarn Container, not across Containers. A Yarn Container is provided by the YARN Resource Manager on demand - at start of Spark Application of via YARN Dynamic Resource Allocation. A Yarn Container can have only one Spark Executor, but 1 or indeed more Cores can be assigned to the Executor.

What are executors when we run Spark on YARN?

So, the Resource Manager sends the request of 2 Cores and 2 GB memory packed together as a Container. These containers are known as Executors. The resource allocation requests are handled by the NodeManager of each individual worker node and are responsible for the resource allocation of the job.

What is executor in YARN?

An executor is a process that is launched for a Spark application on a worker node. Each executor memory is the sum of yarn overhead memory and JVM Heap memory. JVM Heap memory comprises of: RDD Cache Memory. Shuffle Memory.

What is the difference between executor and executor core in Spark?

The cores property controls the number of concurrent tasks an executor can run. - -executor-cores 5 means that each executor can run a maximum of five tasks at the same time.


1 Answers

Spark Executor Runs within a Yarn Container. A Yarn Container is provided by Resource Manager on demand. A Yarn container can have 1 or more Spark Executors. Spark-Executors are the one which runs the Tasks. Spark Executor will be started on a Worker Node(DataNode)

In your case when you set executor-memory = 20G -> This means you are asking for a Container of size 20GB in which your Executors will be running. Now you might have 1 or more Executors using this 20GB of Memory and this is Per Worker Node.

So for example if u have a Cluster to 8 nodes, it will be 8 * 20 GB of Total Memory for your Job.

Below are the 3 config options available in yarn-site.xml with which you can play around and see the differences.

yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb
like image 73
AJm Avatar answered Sep 28 '22 10:09

AJm