TL;DR
Spark UI shows different number of cores and memory than what I'm asking it when using spark-submit
more details:
I'm running Spark 1.6 in standalone mode. When I run spark-submit I pass it 1 executor instance with 1 core for the executor and also 1 core for the driver. What I would expect to happen is that my application will be ran with 2 cores total. When I check the environment tab on the UI I see that it received the correct parameters I gave it, however it still seems like its using a different number of cores. You can see it here:
This is my spark-defaults.conf that I'm using:
spark.executor.memory 5g
spark.executor.cores 1
spark.executor.instances 1
spark.driver.cores 1
Checking the environment tab on the Spark UI shows that these are indeed the received parameters but the UI still shows something else
Does anyone have any idea on what might cause Spark to use different number of cores than what I want I pass it? I obviously tried googling it but didn't find anything useful on that topic
Thanks in advance
According to the recommendations which we discussed above: Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30.
From basic math (X * Y= 15), we can see that there are four different executor & core combinations that can get us to 15 Spark cores per node: Possible configurations for executor.
What happens when a Spark Job is submitted? When a client submits a spark user application code, the driver implicitly converts the code containing transformations and actions into a logical directed acyclic graph (DAG).
Every Spark executor in an application has the same fixed number of cores and same fixed heap size. The number of cores can be specified with the --executor-cores flag when invoking spark-submit, spark-shell, and pyspark from the command line, or by setting the spark. executor. cores property in the spark-defaults.
TL;DR
Use spark.cores.max
instead to define the total number of cores available, and thus limit the number of executors.
In standalone mode, a greedy strategy is used and as many executors will be created as there are cores and memory available on your worker.
In your case, you specified 1 core and 5GB of memory per executor. The following will be calculated by Spark :
Spark basically fulfilled what you asked for. In order to achieve what you want, you can make use of the spark.cores.max
option documented here and specify the exact number of cores you need.
A few side-notes :
spark.executor.instances
is a YARN-only configurationspark.driver.memory
defaults to 1 core alreadyI am also working on easing the notion of the number of executors in standalone mode, this might get integrated into a next release of Spark and hopefully help figuring out exactly the number of executors you are going to have, without having to calculate it on the go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With