Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run script in Pyspark and drop into IPython shell when done?

I want to run a spark script and drop into an IPython shell to interactively examine data.

Running both:

$ IPYTHON=1 pyspark --master local[2] myscript.py

and

$ IPYTHON=1 spark-submit --master local[2] myscript.py

both exit out of IPython once done.

This seems really simple, but can't find how to do it anywhere.

like image 339
lollercoaster Avatar asked Sep 19 '14 13:09

lollercoaster


People also ask

How do I run PySpark code in shell?

Go to the Spark Installation directory from the command line and type bin/pyspark and press enter, this launches pyspark shell and gives you a prompt to interact with Spark in Python language. If you have set the Spark in a PATH then just enter pyspark in command line or terminal (mac users).

How do I run a Python script in IPython?

To do this, first start the terminal program to get a command prompt. Then give the command ipython3 and press enter. Then you can write Python commands and execute them by pressing enter. Note that IPython supports tab completion.

How do I get existing SparkContext?

In Spark/PySpark you can get the current active SparkContext and its configuration settings by accessing spark. sparkContext. getConf. getAll() , here spark is an object of SparkSession and getAll() returns Array[(String, String)] , let's see with examples using Spark with Scala & PySpark (Spark with Python).

How do you call a spark submit from a Python script?

Spark Submit Python File Apache Spark binary comes with spark-submit.sh script file for Linux, Mac, and spark-submit. cmd command file for windows, these scripts are available at $SPARK_HOME/bin directory which is used to submit the PySpark file with . py extension (Spark with python) to the cluster.


2 Answers

If you launch the iPython shell with:

$ IPYTHON=1 pyspark --master local[2]

you can do:

 >>> %run myscript.py

and all variables will stay in the workspace. You can also debug step by step with:

>>> %run -d myscript.py
like image 136
elyase Avatar answered Oct 19 '22 09:10

elyase


Launch the IPython shell using IPYTHON=1 pyspark, then run execfile('/path/to/myscript.py'), that should run your script inside the shell and return back to it.

like image 30
Alaa Ali Avatar answered Oct 19 '22 10:10

Alaa Ali