Are there pros/cons, or maybe different use cases for using spark-submit to submit a python script vs. simply running a .py file with the python executable (and importing SparkSession), like this?
from pyspark.sql import SparkSession
spk = SparkSession.builder.master(master).getOrCreate()
Basically, are there any differences running the script via python
and not spark-submit.
spark-submit
is mostly a convenience method. It allows you to set all desired configuration, environment variables, and other options on submit.
It also allows you to set JVM options, which cannot be set on the running virtual machine. Since JVM is initialized once Spark configuration is created, it is not possible to do the same from the running Python process.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With