Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark-submit master url and SparkSession master url in the main class, what is difference?

Tags:

apache-spark

When submitting a job with spark-submit I set the master URL and give him a main class, ex:

spark-submit --class WordCount --master spark://spark:7077 my.jar

But inside this main class my spark context define another master url :

SparkSession.builder().appName("Word2vec").master("local").

This get me confused, what happens if I send a job with spark-submit to the master of a standalone cluster (spark://spark:7077) that start a SparkSession with a local master ?

Should the SparkSession master url always be the same as the spark-submit url when executed on a cluster ?

like image 696
Quentin Avatar asked Aug 08 '16 10:08

Quentin


People also ask

What is the difference between SparkSession and SparkContext?

SparkSession vs SparkContext – Since earlier versions of Spark or Pyspark, SparkContext (JavaSparkContext for Java) is an entry point to Spark programming with RDD and to connect to Spark Cluster, Since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset.

What is master in SparkSession?

SparkSession.Builder. master(String master) Sets the Spark master URL to connect to, such as "local" to run locally, "local[4]" to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster.

What is master in spark submit?

--master : The master URL for the cluster (e.g. spark://23.195.26.187:7077 ) --deploy-mode : Whether to deploy your driver on the worker nodes ( cluster ) or locally as an external client ( client ) (default: client ) † --conf : Arbitrary Spark configuration property in key=value format.

Should I use SparkSession or SparkContext?

As a result, when comparing SparkSession vs SparkContext, as of Spark 2.0. 0, it is better to use SparkSession because it provides access to all of the Spark features that the other three APIs do.


1 Answers

There is no difference between these properties. If set both, properties set directly in application take precedence. To quote documentation:

Any values specified as flags or in the properties file will be passed on to the application and merged with those specified through SparkConf. Properties set directly on the SparkConf take highest precedence, then flags passed to spark-submit or spark-shell, then options in the spark-defaults.conf file. A few configuration keys have been renamed since earlier versions of Spark; in such cases, the older key names are still accepted, but take lower precedence than any instance of the newer key.

like image 173
zero323 Avatar answered Nov 15 '22 17:11

zero323