Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load IPython shell with PySpark

I want to load IPython shell (not IPython notebook) in which I can use PySpark through command line. Is that possible? I have installed Spark-1.4.1.

like image 279
pg2455 Avatar asked Aug 06 '15 17:08

pg2455


People also ask

How do I run a shell command in Pyspark?

Go to the Spark Installation directory from the command line and type bin/pyspark and press enter, this launches pyspark shell and gives you a prompt to interact with Spark in Python language. If you have set the Spark in a PATH then just enter pyspark in command line or terminal (mac users).

How do I run IPython shell?

You start IPython by typing “ipython” in your terminal. $ ipython Python 2.7. 2 (default, Jun 20 2012, 16:23:33) Type "copyright", "credits" or "license" for more information. IPython 0.13.

How do I import IPython into Python?

No more typing “import pandas as pd” 10 times a dayCreate a folder called startup if it's not already there. Add a new Python file called start.py. Put your favorite imports in this file. Launch IPython or a Jupyter Notebook and your favorite libraries will be automatically loaded every time!


3 Answers

If you use Spark < 1.2 you can simply execute bin/pyspark with an environmental variable IPYTHON=1.

IPYTHON=1 /path/to/bin/pyspark

or

export IPYTHON=1
/path/to/bin/pyspark

While above will still work on the Spark 1.2 and above recommended way to set Python environment for these versions is PYSPARK_DRIVER_PYTHON

PYSPARK_DRIVER_PYTHON=ipython /path/to/bin/pyspark

or

export PYSPARK_DRIVER_PYTHON=ipython
/path/to/bin/pyspark

You can replace ipython with a path to the interpreter of your choice.

like image 163
zero323 Avatar answered Oct 11 '22 11:10

zero323


This answer is an adapted and shortened version of a similar post my website: https://jupyter.ai/pyspark-session/

I use ptpython(1), which supplies ipython functionality as well as your choice of either vi(1) or emacs(1) key-bindings. It also supplies dynamic code pop-up/intelligence, which is extremely useful when performing ad-hoc SPARK work on the CLI, or simply trying to learn the Spark API.

Here is what my vi-enabled ptpython session looks like, taking note of the VI (INSERT) mode at the bottom of the screehshot, as well as the ipython style prompt to indicate that those ptpython capabilities have been selected (more on how to select them in a moment):

enter image description here

To get all of this, perform the following simple steps:

user@linux$ pip3 install ptpython # Everything here assumes Python3

user@linux$ vi ${SPARK_HOME}/conf/spark-env.sh
    # Comment-out/disable the following two lines. This is necessary because
    # they take precedence over any UNIX environment settings for them:
       # PYSPARK_PYTHON=/path/to/python
       # PYSPARK_DRIVER_PYTHON=/path/to/python

user@linux$ vi ${HOME}/.profile # Or whatever your login RC-file is.
    # Add these two lines:
       export PYSPARK_PYTHON=python3           # Fully-Qualify this if necessary. (python3)
       export PYSPARK_DRIVER_PYTHON=ptpython3  # Fully-Qualify this if necessary. (ptpython3)

user@linux$ . ${HOME}/.profile  # Source the RC file.

user@linux$ pyspark
    # You are now running pyspark(1) within ptpython; a code pop-up/interactive
    # shell; with your choice of vi(1) or emacs(1) key-bindings; and 
    # your choice of ipython functionality or not.

To select your pypython preferences (and there are a bunch of them), simply press F2 from within a ptpython session, and select whatever options you want.

CLOSING NOTE: If you are submitting a Python Spark Application (as opposed to interacting with pyspark(1) via the CLI, as shown above), simply set PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON programmatically in Python, like so:

os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3' # Not 'ptpython3' in this case.

I hope this answer and setup is useful.

like image 24
NYCeyes Avatar answered Oct 11 '22 09:10

NYCeyes


if version of spark >= 2.0 and the follow config could be adding to .bashrc

export PYSPARK_PYTHON=/data/venv/your_env/bin/python
export PYSPARK_DRIVER_PYTHON=/data/venv/your_env/bin/ipython
like image 8
shengshan zhang Avatar answered Oct 11 '22 10:10

shengshan zhang