I am trying to override spark properties such as num-executors
while submitting the application by spark-submit as below :
spark-submit --class WC.WordCount \
--num-executors 8 \
--executor-cores 5 \
--executor-memory 3584M \
...../<myjar>.jar \
/public/blahblahblah /user/blahblah
However its running with default number of executors which is 2. But I am able to override properties if I add
--master yarn
Can someone explain why it is so ? Interestingly , in my application code I am setting master as yarn-client:
val conf = new SparkConf()
.setAppName("wordcount")
.setMaster("yarn-client")
.set("spark.ui.port","56487")
val sc = new SparkContext(conf)
Can someone throw some light as to how the option --master
works
Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30. Leaving 1 executor for ApplicationManager => --num-executors = 29. Number of executors per node = 30/10 = 3. Memory per executor = 64GB/3 = 21GB.
cores to determine the size of each executor that Spark is going to allocate. Otherwise, whenever Spark is going to allocate a new executor to your application, it is going to allocate an entire node (if available), even if all you need is just five more cores.
Every Spark executor in an application has the same fixed number of cores and same fixed heap size. The number of cores can be specified with the --executor-cores flag when invoking spark-submit, spark-shell, and pyspark from the command line, or by setting the spark. executor. cores property in the spark-defaults.
The maximum number of executors to be used. Its Spark submit option is --max-executors . If it is not set, default is 2.
I am trying to override spark properties such as num-executors while submitting the application by spark-submit as below
It will not work (unless you override spark.master
in conf/spark-defaults.conf
file or similar so you don't have to specify it explicitly on the command line).
The reason is that the default Spark master is local[*]
and the number of executors is exactly one, i.e. the driver. That's just the local deployment environment. See Master URLs.
As a matter of fact, num-executors
is very YARN-dependent as you can see in the help:
$ ./bin/spark-submit --help
...
YARN-only:
--num-executors NUM Number of executors to launch (Default: 2).
If dynamic allocation is enabled, the initial number of
executors will be at least NUM.
That explains why it worked when you switched to YARN. It is supposed to work with YARN (regardless of the deploy mode, i.e. client or cluster which is about the driver alone not executors).
You may be wondering why it did not work with the master defined in your code then. The reason is that it is too late since the master has already been assigned on launch time when you started the application using spark-submit. That's exactly the reason why you should not specify deployment environment-specific properties in the code as:
That's why you should be always using spark-submit
to submit your Spark applications (unless you've got reasons not to, but then you'd know why and could explain it with ease).
If you’d like to run the same application with different masters or different amounts of memory. Spark allows you to do that with an default SparkConf
. As you are mentioning properties to SparkConf
, those takes highest precedence for application, Check the properties precedence at the end.
Example:
val sc = new SparkContext(new SparkConf())
Then, you can supply configuration values at runtime:
./bin/spark-submit \
--name "My app" \
--deploy-mode "client" \
--conf spark.ui.port=56487 \
--conf spark.master=yarn \ #alternate to --master
--conf spark.executor.memory=4g \ #alternate to --executor-memory
--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
--class WC.WordCount \
/<myjar>.jar \
/public/blahblahblah \
/user/blahblah
Properties precedence order (top one is more)
- Properties set directly on the
SparkConf
(in the code) take highest precedence.- Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf.
- then flags passed to
spark-submit
orspark-shell
like--master
etc- then options in the
spark-defaults.conf
file.A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.
Source: Dynamically Loading Spark Properties
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With