Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark shows different number of cores than what is passed to it using spark-submit

Tags:

apache-spark

TL;DR

Spark UI shows different number of cores and memory than what I'm asking it when using spark-submit

more details:

I'm running Spark 1.6 in standalone mode. When I run spark-submit I pass it 1 executor instance with 1 core for the executor and also 1 core for the driver. What I would expect to happen is that my application will be ran with 2 cores total. When I check the environment tab on the UI I see that it received the correct parameters I gave it, however it still seems like its using a different number of cores. You can see it here:

enter image description here

This is my spark-defaults.conf that I'm using:

spark.executor.memory 5g
spark.executor.cores 1
spark.executor.instances 1
spark.driver.cores 1

Checking the environment tab on the Spark UI shows that these are indeed the received parameters but the UI still shows something else

Does anyone have any idea on what might cause Spark to use different number of cores than what I want I pass it? I obviously tried googling it but didn't find anything useful on that topic

Thanks in advance

like image 588
Gideon Avatar asked Jun 13 '16 08:06

Gideon


People also ask

How does Spark calculate number of cores?

According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.

What is number of cores in Spark?

From basic math (X * Y= 15), we can see that there are four different executor & core combinations that can get us to 15 Spark cores per node: Possible configurations for executor.

What happens when Spark job is submitted?

What happens when a Spark Job is submitted? When a client submits a spark user application code, the driver implicitly converts the code containing transformations and actions into a logical directed acyclic graph (DAG).

How do I set executor cores in Spark?

Every Spark executor in an application has the same fixed number of cores and same fixed heap size. The number of cores can be specified with the --executor-cores flag when invoking spark-submit, spark-shell, and pyspark from the command line, or by setting the spark. executor. cores property in the spark-defaults.


Video Answer


1 Answers

TL;DR

Use spark.cores.max instead to define the total number of cores available, and thus limit the number of executors.


In standalone mode, a greedy strategy is used and as many executors will be created as there are cores and memory available on your worker.

In your case, you specified 1 core and 5GB of memory per executor. The following will be calculated by Spark :

  • As there are 8 cores available, it will try to create 8 executors.
  • However, as there is only 30GB of memory available, it can only create 6 executors : each executor will have 5GB of memory, which adds up to 30GB.
  • Therefore, 6 executors will be created, and a total of 6 cores will be used with 30GB of memory.

Spark basically fulfilled what you asked for. In order to achieve what you want, you can make use of the spark.cores.max option documented here and specify the exact number of cores you need.

A few side-notes :

  • spark.executor.instances is a YARN-only configuration
  • spark.driver.memory defaults to 1 core already

I am also working on easing the notion of the number of executors in standalone mode, this might get integrated into a next release of Spark and hopefully help figuring out exactly the number of executors you are going to have, without having to calculate it on the go.

like image 75
Jonathan Taws Avatar answered Oct 06 '22 18:10

Jonathan Taws