Does there any difference or priority between specifying spark application configuration in the code :
SparkConf().setMaster(yarn)
and specifying them in command line
spark-submit --master yarn
Yes, the highest priority is given to the configuration in the user's code with the set() function. After that there the flags passed with spark-submit.
Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file. A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.
Source
There are 4 precedence level: (1 to 4 , 1 being the highest priority):
Other than the priority, specifying it on a command-line would allow you to run on different cluster managers without modifying code. The same application can be run on local[n] or yarn or mesos or spark standalone cluster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With