Spark reads the default configurations from $SPARK_HOME/conf/spark-defaults.conf.
You can also change the default location using the --properties-file [FILE] command-line argument when using (say) spark-submit.
What I want to do is load additional arguments from a file without having to replace the default ones. That is, I want spark to load the properties from spark-defaults.conf and load more properties from another file. Now, in case there are properties defined in both, I would prefer if the last configuration file wins.
Is this supported by default in Spark?
tl;dr No.
As described in the Spark documentation, here is the order of preference for configuration:
"Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file."
Given this, I would use Typesafe Config in my driver code to load a custom configuration file and set whatever I find directly on the SparkConf. Anything set there will take precedence over any prior configurations from elsewhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With