TL;DR Spark UI shows different number of cores and memory than what I'm asking it when using spark-submit more details: I'm running Spark 1.6 in standalone mode. When I run spark-submit I pass it 1 executor instance with 1 core for the executor and also 1 core for the driver. What I would expect to happen is that my application will be ran with 2 cores total. When I check the environment tab on the UI I see that it received the correct parameters I gave it, however it still seems like its using a different number of cores. You can see it here: <img src="https://i.stack.imgur.com/ZA7Wm.png" alt="enter image description here"> This is my spark-defaults.conf that I'm using: <pre class="prettyprint"><code>spark.executor.memory 5g spark.executor.cores 1 spark.executor.instances 1 spark.driver.cores 1 </code></pre> Checking the environment tab on the Spark UI shows that these are indeed the received parameters but the UI still shows something else Does anyone have any idea on what might cause Spark to use different number of cores than what I want I pass it? I obviously tried googling it but didn't find anything useful on that topic Thanks in advance

TL;DR Use <code>spark.cores.max</code> instead to define the total number of cores available, and thus limit the number of executors. <hr> In standalone mode, a greedy strategy is used and as many executors will be created as there are cores and memory available on your worker. In your case, you specified 1 core and 5GB of memory per executor. The following will be calculated by Spark : <ul> <li>As there are 8 cores available, it will try to create 8 executors.</li> <li>However, as there is only 30GB of memory available, it can only create 6 executors : each executor will have 5GB of memory, which adds up to 30GB. </li> <li>Therefore, 6 executors will be created, and a total of 6 cores will be used with 30GB of memory.</li> </ul> Spark basically fulfilled what you asked for. In order to achieve what you want, you can make use of the <code>spark.cores.max</code> option documented here and specify the exact number of cores you need. A few side-notes : <ul> <li> <code>spark.executor.instances</code> is a YARN-only configuration</li> <li> <code>spark.driver.memory</code> defaults to 1 core already</li> </ul> I am also working on easing the notion of the number of executors in standalone mode, this might get integrated into a next release of Spark and hopefully help figuring out exactly the number of executors you are going to have, without having to calculate it on the go.

Spark shows different number of cores than what is passed to it using spark-submit

Tags:

apache-spark

TL;DR

Spark UI shows different number of cores and memory than what I'm asking it when using spark-submit

more details:

I'm running Spark 1.6 in standalone mode. When I run spark-submit I pass it 1 executor instance with 1 core for the executor and also 1 core for the driver. What I would expect to happen is that my application will be ran with 2 cores total. When I check the environment tab on the UI I see that it received the correct parameters I gave it, however it still seems like its using a different number of cores. You can see it here:

enter image description here

This is my spark-defaults.conf that I'm using:

spark.executor.memory 5g
spark.executor.cores 1
spark.executor.instances 1
spark.driver.cores 1

Checking the environment tab on the Spark UI shows that these are indeed the received parameters but the UI still shows something else

Does anyone have any idea on what might cause Spark to use different number of cores than what I want I pass it? I obviously tried googling it but didn't find anything useful on that topic

Thanks in advance

588

asked Jun 13 '16 08:06

Gideon

Video Answer

1 Answers

TL;DR

Use spark.cores.max instead to define the total number of cores available, and thus limit the number of executors.

In standalone mode, a greedy strategy is used and as many executors will be created as there are cores and memory available on your worker.

In your case, you specified 1 core and 5GB of memory per executor. The following will be calculated by Spark :

As there are 8 cores available, it will try to create 8 executors.
However, as there is only 30GB of memory available, it can only create 6 executors : each executor will have 5GB of memory, which adds up to 30GB.
Therefore, 6 executors will be created, and a total of 6 cores will be used with 30GB of memory.

Spark basically fulfilled what you asked for. In order to achieve what you want, you can make use of the spark.cores.max option documented here and specify the exact number of cores you need.

A few side-notes :

spark.executor.instances is a YARN-only configuration
spark.driver.memory defaults to 1 core already

I am also working on easing the notion of the number of executors in standalone mode, this might get integrated into a next release of Spark and hopefully help figuring out exactly the number of executors you are going to have, without having to calculate it on the go.

answered Oct 06 '22 18:10

Jonathan Taws

Related questions
                            
                                Do parquet files preserve the row order of Spark DataFrames?
                            
                                Not enough space to cache rdd in memory warning
                            
                                How does the number of partitions affect `wholeTextFiles` and `textFiles`?
                            
                                Regrouping / Concatenating DataFrame rows in Spark
                            
                                A quick guide on Salt-based install of Spark cluster
                            
                                What are the pros and cons of using broadcast variables in a singleton?
                            
                                Spark: why tasks assigned only to one worker?
                            
                                Spark-HBASE Error java.lang.IllegalStateException: unread block data
                            
                                How to add a typesafe config file which is located on HDFS to spark-submit (cluster-mode)?
                            
                                Is it possible to run spark yarn cluster from the code?
                            
                                Persisting data to DynamoDB using Apache Spark
                            
                                Merge multiple RDD generated in loop
                            
                                Spark not leveraging hdfs partitioning with parquet
                            
                                Efficiency of flatMap vs map followed by reduce in Spark
                            
                                How access individual element in a tuple on a RDD in pyspark?
                            
                                Can a model be created on Spark batch and use it in Spark streaming?
                            
                                How to save RandomForestClassifier Spark model in scala?
                            
                                How can I declare a Column as a categorical feature in a DataFrame for use in ml
                            
                                Passing Python functions as objects to Spark
                            
                                How to run spark shell with *local* packages?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With