When submitting a job with spark-submit I set the master URL and give him a main class, ex:
spark-submit --class WordCount --master spark://spark:7077 my.jar
But inside this main class my spark context define another master url :
SparkSession.builder().appName("Word2vec").master("local").
This get me confused, what happens if I send a job with spark-submit
to the master of a standalone cluster (spark://spark:7077
) that start a SparkSession
with a local
master ?
Should the SparkSession
master url always be the same as the spark-submit
url when executed on a cluster ?
SparkSession vs SparkContext – Since earlier versions of Spark or Pyspark, SparkContext (JavaSparkContext for Java) is an entry point to Spark programming with RDD and to connect to Spark Cluster, Since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset.
SparkSession.Builder. master(String master) Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]" to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.
--master : The master URL for the cluster (e.g. spark://23.195.26.187:7077 ) --deploy-mode : Whether to deploy your driver on the worker nodes ( cluster ) or locally as an external client ( client ) (default: client ) † --conf : Arbitrary Spark configuration property in key=value format.
As a result, when comparing SparkSession vs SparkContext, as of Spark 2.0. 0, it is better to use SparkSession because it provides access to all of the Spark features that the other three APIs do.
There is no difference between these properties. If set both, properties set directly in application take precedence. To quote documentation:
Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file. A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With