Installed apache-maven-3.3.3, scala 2.11.6, then ran:
$ git clone git://github.com/apache/spark.git -b branch-1.4 $ cd spark $ build/mvn -DskipTests clean package
Finally:
$ git clone https://github.com/apache/incubator-zeppelin $ cd incubator-zeppelin/ $ mvn install -DskipTests
Then ran the server:
$ bin/zeppelin-daemon.sh start
Running a simple notebook beginning with %pyspark
, I got an error about py4j
not being found. Just did pip install py4j
(ref).
Now I'm getting this error:
pyspark is not responding Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py", line 22, in <module> from pyspark.conf import SparkConf ImportError: No module named pyspark.conf
I've tried setting my SPARK_HOME
to: /spark/python:/spark/python/lib
. No change.
The SPARK_HOME variable is the directory/folder where sparkling water will find the spark run time.
Before starting PySpark, you need to set the following environments to set the Spark path and the Py4j path. Or, to set the above environments globally, put them in the . bashrc file. Then run the following command for the environments to work.
Two environment variables are required:
SPARK_HOME=/spark PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-VERSION-src.zip:$PYTHONPATH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With