I am running ipython notebook
on my local machine and want to create a connection to a remote Spark
server i.p. and then read in data from hdfs
folder present on the remote server
. How can I create such a remote connection to a Spark server from local ipython notebook
?
Are there any particular reasons the notebook has to run from your local machine? If not, it will be as easy as
Install jupyter/ipython on the remote machine running sparkremote$ pip install "jupyter[all]"
Modify spark-env.sh and add the two linesexport PYSPARK_PYTHON=/usr/bin/python2.7 #your location may vary
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/ipython
Launch pysparkPYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7777" pyspark
On your local machine, set up an ssh tunnelssh -i private_key -N -f -L localhost:7776:localhost:7777 [email protected]
On your local browser, visit http://localhost:7776
You may want to run #3 behind screen/tmux to maintain it for longer duration.
Some helpful pages:
[1]. http://jupyter-notebook.readthedocs.org/en/latest/public_server.html
[2]. http://blog.insightdatalabs.com/jupyter-on-apache-spark-step-by-step
You could try SparkMagic.
SparkMagic is a client of Livy that can run within Jupyter notebooks. When we write Spark code on the local Jupyter client, SparkMagic runs the Spark job remotely through livy.
Using SparkMagic + Jupyter notebook, we can use Spark from our local Jupyter notebook, which is running on our localhost, and then use it to connect to a remote Spark cluster
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With