Ipython-Spark setup for pyspark application

Question

I followed this link to install Spark Standalone mode on a cluster by placing pre-built versions of spark on each node on the cluster and running ./sbin/start-master.sh on Master and ./sbin/start-slave.sh <master-spark-URL> on slave. How do I continue from there to setup a pyspark application, for example in ipython notebook to utilize the cluster? Do I need to install ipython on my local machine(laptop)?

shanmuga · Accepted Answer

To use ipython to run pyspark You'll need to set add the following environment variables in .bashrc

export PYSPARK_DRIVER_PYTHON=ipython2 # As pyspark only works with python2 and not python3
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

This will cause ipython2 notebook to be launched when you execute pyspark from shell.

Note: I assume you already have ipython notebook installed. If not the easiest method is to use Anaconda python.

Reference:

https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/

Ipython-Spark setup for pyspark application

Tags:

python

ipython

apache-spark

pyspark

DevEx

1 Answers

shanmuga

Recent Activity

Donate For Us

Ipython-Spark setup for pyspark application

Tags:

python

ipython

apache-spark

pyspark

DevEx

1 Answers

shanmuga

Related questions

Recent Activity

Donate For Us