I'm trying to understand the importance of setting the master property when running a spark application.
The cluster location is at the default port of 7077. I'm running this app from a testmachine where it will hit an s3 bucket.
Currently spark configuration in the app reads:
val sparkConf = new SparkConf()
.setMaster("spark://127.0.0.1:7077")
but I'm also setting the flag on the command line with spark submit:
--master spark://127.0.0.1:7077
So, does having both of these set cause problems? Does one get overridden by the other? Are they both necessary?
setMaster(String master) The master URL to connect to, such as "local" to run locally with one thread, "local[4]" to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster. SparkConf. setSparkHome(String home) Set the location where Spark is installed on worker nodes.
Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell , then options in the spark-defaults.conf file.
So, does having both of these set cause problems? Does one get overridden by the other? Are they both necessary?
The Spark Configuration page is very clear (emphasis mine):
Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file. A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With