Spark reads the default configurations from $SPARK_HOME/conf/spark-defaults.conf
.
You can also change the default location using the --properties-file [FILE]
command-line argument when using (say) spark-submit
.
What I want to do is load additional arguments from a file without having to replace the default ones. That is, I want spark to load the properties from spark-defaults.conf
and load more properties from another file. Now, in case there are properties defined in both, I would prefer if the last configuration file wins.
Is this supported by default in Spark?
tl;dr No.
As described in the Spark documentation, here is the order of preference for configuration:
"Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file."
Given this, I would use Typesafe Config in my driver code to load a custom configuration file and set whatever I find directly on the SparkConf
. Anything set there will take precedence over any prior configurations from elsewhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With