I have an Apache Spark application running on a YARN cluster (spark has 3 nodes on this cluster) on cluster mode. When the application is running the Spark-UI shows that 2 executors (each running on a different node) and the driver are running on the third node. I want the application to use more executors so I tried adding the argument --num-executors to Spark-submit and set it to 6. <code>spark-submit --driver-memory 3G --num-executors 6 --class main.Application --executor-memory 11G --master yarn-cluster myJar.jar <arg1> <arg2> <arg3> ...</code> However, the number of executors remains 2. On spark UI I can see that the parameter spark.executor.instances is 6, just as I intended, and somehow there are still only 2 executors. I even tried setting this parameter from the code <pre class="prettyprint"><code>sparkConf.set("spark.executor.instances", "6") </code></pre> Again, I can see that the parameter was set to 6, but still there are only 2 executors. Does anyone know why I couldn't increase the number of my executors? yarn.nodemanager.resource.memory-mb is 12g in yarn-site.xml

Increase <code>yarn.nodemanager.resource.memory-mb</code> in <code>yarn-site.xml</code> With 12g per node you can only launch driver(3g) and 2 executors(11g). Node1 - driver 3g (+7% overhead) Node2 - executor1 11g (+7% overhead) Node3 - executor2 11g (+7% overhead) now you are requesting for executor3 of 11g and no node has 11g memory available. for 7% overhead refer spark.yarn.executor.memoryOverhead and spark.yarn.driver.memoryOverhead in https://spark.apache.org/docs/1.2.0/running-on-yarn.html

Apache Spark: setting executor instances does not change the executors

Tags:

I have an Apache Spark application running on a YARN cluster (spark has 3 nodes on this cluster) on cluster mode.

When the application is running the Spark-UI shows that 2 executors (each running on a different node) and the driver are running on the third node. I want the application to use more executors so I tried adding the argument --num-executors to Spark-submit and set it to 6.

spark-submit --driver-memory 3G --num-executors 6 --class main.Application --executor-memory 11G --master yarn-cluster myJar.jar <arg1> <arg2> <arg3> ...

However, the number of executors remains 2.

On spark UI I can see that the parameter spark.executor.instances is 6, just as I intended, and somehow there are still only 2 executors.

I even tried setting this parameter from the code

Click to copy

sparkConf.set("spark.executor.instances", "6")

Again, I can see that the parameter was set to 6, but still there are only 2 executors.

Does anyone know why I couldn't increase the number of my executors?

yarn.nodemanager.resource.memory-mb is 12g in yarn-site.xml

537

asked Apr 29 '15 10:04

user4688877

2 Answers

Increase yarn.nodemanager.resource.memory-mb in yarn-site.xml

With 12g per node you can only launch driver(3g) and 2 executors(11g).

Node1 - driver 3g (+7% overhead)

Node2 - executor1 11g (+7% overhead)

Node3 - executor2 11g (+7% overhead)

now you are requesting for executor3 of 11g and no node has 11g memory available.

for 7% overhead refer spark.yarn.executor.memoryOverhead and spark.yarn.driver.memoryOverhead in https://spark.apache.org/docs/1.2.0/running-on-yarn.html

answered Nov 08 '22 10:11

banjara

Note that yarn.nodemanager.resource.memory-mb is total memory that a single NodeManager can allocate across all containers on one node.

In your case, since yarn.nodemanager.resource.memory-mb = 12G, if you add up the memory allocated to all YARN containers on any single node, it cannot exceed 12G.

You have requested 11G (-executor-memory 11G) for each Spark Executor container. Though 11G is less than 12G, this still won't work. Why ?

Because you have to account for spark.yarn.executor.memoryOverhead, which is min(executorMemory * 0.10, 384) (by default, unless you override it).

So, following math must hold true:

spark.executor.memory + spark.yarn.executor.memoryOverhead <= yarn.nodemanager.resource.memory-mb

See: https://spark.apache.org/docs/latest/running-on-yarn.html for latest documentation on spark.yarn.executor.memoryOverhead

Moreover, spark.executor.instances is merely a request. Spark ApplicationMaster for your application will make a request to YARN ResourceManager for number of containers = spark.executor.instances. Request will be granted by ResourceManager on NodeManager node based on:

Resource availability on the node. YARN scheduling has its own nuances - this is a good primer on how YARN FairScheduler works.
Whether yarn.nodemanager.resource.memory-mb threshold has not been exceeded on the node:
- (number of spark containers running on the node * (spark.executor.memory + spark.yarn.executor.memoryOverhead)) <= yarn.nodemanager.resource.memory-mb*

If the request is not granted, request will be queued and granted when above conditions are met.

answered Nov 08 '22 12:11

user3730028

Related questions
                            
                                How to use registerMetatag in Yii 2
                            
                                Level Metering with AVAudioEngine
                            
                                Why am I getting "could not find com.android.tools.build:gradle" error?
                            
                                How to test decorated React component with shallow rendering
                            
                                How to compute skipgrams in python?
                            
                                java.lang.IllegalStateException: Error processing condition on org.springframework.boot.autoconfigure.jdbc.JndiDataSourceAutoConfiguration
                            
                                Is there any way to code the LaunchScreen programmatically
                            
                                How to get a slice from an Iterator?
                            
                                Move boot2docker and .docker folder in other drive
                            
                                Which to use Auth::check() or Auth::user() - Laravel 5.1?
                            
                                Overlay grid rather than draw on top of it
                            
                                What does "using namespace" do exactly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With