Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark load settings from multiple configuration files

Tags:

apache-spark

Spark reads the default configurations from $SPARK_HOME/conf/spark-defaults.conf.

You can also change the default location using the --properties-file [FILE] command-line argument when using (say) spark-submit.

What I want to do is load additional arguments from a file without having to replace the default ones. That is, I want spark to load the properties from spark-defaults.conf and load more properties from another file. Now, in case there are properties defined in both, I would prefer if the last configuration file wins.

Is this supported by default in Spark?

like image 418
marios Avatar asked Apr 12 '17 01:04

marios


1 Answers

tl;dr No.

As described in the Spark documentation, here is the order of preference for configuration:

"Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file."

Given this, I would use Typesafe Config in my driver code to load a custom configuration file and set whatever I find directly on the SparkConf. Anything set there will take precedence over any prior configurations from elsewhere.

like image 177
Vidya Avatar answered Nov 10 '22 12:11

Vidya