I want to run a spark script and drop into an IPython shell to interactively examine data.
Running both:
$ IPYTHON=1 pyspark --master local[2] myscript.py
and
$ IPYTHON=1 spark-submit --master local[2] myscript.py
both exit out of IPython once done.
This seems really simple, but can't find how to do it anywhere.
Go to the Spark Installation directory from the command line and type bin/pyspark and press enter, this launches pyspark shell and gives you a prompt to interact with Spark in Python language. If you have set the Spark in a PATH then just enter pyspark in command line or terminal (mac users).
To do this, first start the terminal program to get a command prompt. Then give the command ipython3 and press enter. Then you can write Python commands and execute them by pressing enter. Note that IPython supports tab completion.
In Spark/PySpark you can get the current active SparkContext and its configuration settings by accessing spark. sparkContext. getConf. getAll() , here spark is an object of SparkSession and getAll() returns Array[(String, String)] , let's see with examples using Spark with Scala & PySpark (Spark with Python).
Spark Submit Python File Apache Spark binary comes with spark-submit.sh script file for Linux, Mac, and spark-submit. cmd command file for windows, these scripts are available at $SPARK_HOME/bin directory which is used to submit the PySpark file with . py extension (Spark with python) to the cluster.
If you launch the iPython shell with:
$ IPYTHON=1 pyspark --master local[2]
you can do:
>>> %run myscript.py
and all variables will stay in the workspace. You can also debug step by step with:
>>> %run -d myscript.py
Launch the IPython shell using IPYTHON=1 pyspark
, then run execfile('/path/to/myscript.py')
, that should run your script inside the shell and return back to it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With