How to create a connection to a remote Spark server and read in data from ipython running on local machine?

Question

I am running ipython notebook on my local machine and want to create a connection to a remote Spark server i.p. and then read in data from hdfs folder present on the remote server. How can I create such a remote connection to a Spark server from local ipython notebook?

covariantmonkey · Accepted Answer

Are there any particular reasons the notebook has to run from your local machine? If not, it will be as easy as

Install jupyter/ipython on the remote machine running spark
remote$ pip install "jupyter[all]"
Modify spark-env.sh and add the two lines
export PYSPARK_PYTHON=/usr/bin/python2.7 #your location may vary
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/ipython
Launch pyspark
PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7777" pyspark
On your local machine, set up an ssh tunnel
ssh -i private_key -N -f -L localhost:7776:localhost:7777 ubuntu@remote.com
On your local browser, visit http://localhost:7776

You may want to run #3 behind screen/tmux to maintain it for longer duration.

Some helpful pages:
[1]. http://jupyter-notebook.readthedocs.org/en/latest/public_server.html
[2]. http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step

Sanjib Mitra · Answer

You could try SparkMagic.

SparkMagic is a client of Livy that can run within Jupyter notebooks. When we write Spark code on the local Jupyter client, SparkMagic runs the Spark job remotely through livy.

Using SparkMagic + Jupyter notebook, we can use Spark from our local Jupyter notebook, which is running on our localhost, and then use it to connect to a remote Spark cluster

How to create a connection to a remote Spark server and read in data from ipython running on local machine?

Tags:

ipython

ipython-notebook

apache-spark

hdfs

user2966197

2 Answers

covariantmonkey

Sanjib Mitra

Recent Activity

Donate For Us

How to create a connection to a remote Spark server and read in data from ipython running on local machine?

Tags:

ipython

ipython-notebook

apache-spark

hdfs

user2966197

2 Answers

covariantmonkey

Sanjib Mitra

Related questions

Recent Activity

Donate For Us