Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between spark-submit vs. SparkSession in python script?

Are there pros/cons, or maybe different use cases for using spark-submit to submit a python script vs. simply running a .py file with the python executable (and importing SparkSession), like this?

from pyspark.sql import SparkSession
spk = SparkSession.builder.master(master).getOrCreate()

Basically, are there any differences running the script via python and not spark-submit.

like image 886
Luke W Avatar asked Jun 01 '17 15:06

Luke W


1 Answers

spark-submit is mostly a convenience method. It allows you to set all desired configuration, environment variables, and other options on submit.

It also allows you to set JVM options, which cannot be set on the running virtual machine. Since JVM is initialized once Spark configuration is created, it is not possible to do the same from the running Python process.

like image 67
user8098908 Avatar answered Oct 30 '22 01:10

user8098908