Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a connection to a remote Spark server and read in data from ipython running on local machine?

I am running ipython notebook on my local machine and want to create a connection to a remote Spark server i.p. and then read in data from hdfs folder present on the remote server. How can I create such a remote connection to a Spark server from local ipython notebook?

like image 333
user2966197 Avatar asked Nov 23 '15 23:11

user2966197


2 Answers

Are there any particular reasons the notebook has to run from your local machine? If not, it will be as easy as

  1. Install jupyter/ipython on the remote machine running spark
    remote$ pip install "jupyter[all]"

  2. Modify spark-env.sh and add the two lines
    export PYSPARK_PYTHON=/usr/bin/python2.7 #your location may vary
    export PYSPARK_DRIVER_PYTHON=/usr/local/bin/ipython

  3. Launch pyspark
    PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7777" pyspark

  4. On your local machine, set up an ssh tunnel
    ssh -i private_key -N -f -L localhost:7776:localhost:7777 [email protected]

  5. On your local browser, visit http://localhost:7776

You may want to run #3 behind screen/tmux to maintain it for longer duration.

Some helpful pages:
[1]. http://jupyter-notebook.readthedocs.org/en/latest/public_server.html
[2]. http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step

like image 117
covariantmonkey Avatar answered Nov 01 '22 09:11

covariantmonkey


You could try SparkMagic.

SparkMagic is a client of Livy that can run within Jupyter notebooks. When we write Spark code on the local Jupyter client, SparkMagic runs the Spark job remotely through livy.

Using SparkMagic + Jupyter notebook, we can use Spark from our local Jupyter notebook, which is running on our localhost, and then use it to connect to a remote Spark cluster

like image 23
Sanjib Mitra Avatar answered Nov 01 '22 07:11

Sanjib Mitra