Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Spark submit script spark-submit ignore `--num-executors`?

We have Spark 1.0.0 running under YARN, and --num-executors does not seem to increase the number of executors nor nodes that are used. I say I want 8, but I usually end up with between 3 and 5. There are no errors in the output, which is what I would expect if nodes where down and couldn't be spoken to.

NOTE: If you are NOT running under YARN, then num-executors will be ignored, e.g. spark standalone mode. See accepted answer for solution and comment.

UPDATE: If I ask for X resources I want X resources, and if I can't have them, I want to be put in a queue or given an error message of some sort. This is because my job will fail if I don't get X resources - I know just how much resources I need before my job falls over. I don't want to implement some extra layer over my job to check how many executors & nodes I'm going to be given so that it can gracefully kill off the job before it blows up on it's own accord. So the second part of the question is "1) is there a way to tell YARN/Spark to fail if I can't get the executors I want? 2) force stop YARN from putting more than 1 executor on the same node"

(In 0.9.0 this was not a problem, N nodes meant N workers and jobs would just queue)

like image 981
samthebest Avatar asked Jan 07 '15 11:01

samthebest


1 Answers

So yes, the reason why --num-executors was not being respected in my original situation (i.e. under YARN) was because of some kind of buggy behaviour where it won't give you all the executors if that would take you over the max cores/memory.

One way to (a) protect against this (and thus answer my second question) and (b) force a number of executors when running spark standalone mode (and thus addressing the note) is to pass total executor cores & executor cores configs to spark-submit, and automatically calculate total executor cores using a script:

total_executor_cores=`expr ${num_executors} \* ${executor_cores}`

Now if you can't get the num executors you want you will get "waiting" and the job won't start.

For YARN mode to have this conflicting arguments that duplicate information is quite annoying.

NOTE: When using auto scaling clusters, you'll want to avoid controlling the number of executors via total cores and instead control the number of nodes via the auto-scaling settings.

like image 161
samthebest Avatar answered Sep 26 '22 12:09

samthebest