The documentation on spark-submit says the following:
The spark-submit script in Spark’s bin directory is used to launch applications on a cluster.
Regarding the pyspark it says the following:
You can also use bin/pyspark to launch an interactive Python shell.
This question may sound stupid, but when i am running the commands though pyspark
they also run on the "cluster", right? They do not run on the master node only, right?
There is no practical difference between these two. If not configured otherwise both will execute code in a local mode. If master is configured (either by --master
command line parameter or spark.master
configuration) corresponding cluster will be used to execute the program.
If you are using EMR , there are three things
although using all the above three will run the application in spark cluster, there is a difference how the driver program works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With