Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Spark configuration priority

Does there any difference or priority between specifying spark application configuration in the code :


and specifying them in command line

spark-submit --master yarn
like image 279
54l3d Avatar asked Apr 27 '16 09:04


3 Answers

Yes, the highest priority is given to the configuration in the user's code with the set() function. After that there the flags passed with spark-submit.

Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file. A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.


like image 129
RudyVerboven Avatar answered Nov 15 '22 13:11


There are 4 precedence level: (1 to 4 , 1 being the highest priority):

  1. SparkConf set in the application
  2. Properties given with the spark-submit
  3. Properties can be given in a property file. And the property file can be given as argument while submission
  4. Default values
like image 7
Harikrishnan Ck Avatar answered Nov 15 '22 13:11

Harikrishnan Ck

Other than the priority, specifying it on a command-line would allow you to run on different cluster managers without modifying code. The same application can be run on local[n] or yarn or mesos or spark standalone cluster.

like image 3
sparker Avatar answered Nov 15 '22 14:11
