Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark configuration priority

Does there any difference or priority between specifying spark application configuration in the code :

SparkConf().setMaster(yarn)

and specifying them in command line

spark-submit --master yarn
like image 279
54l3d Avatar asked Apr 27 '16 09:04

54l3d


3 Answers

Yes, the highest priority is given to the configuration in the user's code with the set() function. After that there the flags passed with spark-submit.

Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file. A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.

Source

like image 129
RudyVerboven Avatar answered Nov 15 '22 13:11

RudyVerboven


There are 4 precedence level: (1 to 4 , 1 being the highest priority):

  1. SparkConf set in the application
  2. Properties given with the spark-submit
  3. Properties can be given in a property file. And the property file can be given as argument while submission
  4. Default values
like image 7
Harikrishnan Ck Avatar answered Nov 15 '22 13:11

Harikrishnan Ck


Other than the priority, specifying it on a command-line would allow you to run on different cluster managers without modifying code. The same application can be run on local[n] or yarn or mesos or spark standalone cluster.

like image 3
sparker Avatar answered Nov 15 '22 14:11

sparker