Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What to set `SPARK_HOME` to?

Tags:

Installed apache-maven-3.3.3, scala 2.11.6, then ran:

$ git clone git://github.com/apache/spark.git -b branch-1.4 $ cd spark $ build/mvn -DskipTests clean package 

Finally:

$ git clone https://github.com/apache/incubator-zeppelin $ cd incubator-zeppelin/ $ mvn install -DskipTests 

Then ran the server:

$ bin/zeppelin-daemon.sh start 

Running a simple notebook beginning with %pyspark, I got an error about py4j not being found. Just did pip install py4j (ref).

Now I'm getting this error:

pyspark is not responding Traceback (most recent call last):   File "/tmp/zeppelin_pyspark.py", line 22, in <module>     from pyspark.conf import SparkConf ImportError: No module named pyspark.conf 

I've tried setting my SPARK_HOME to: /spark/python:/spark/python/lib. No change.

like image 634
A T Avatar asked Jun 14 '15 00:06

A T


People also ask

What is my Spark_home?

The SPARK_HOME variable is the directory/folder where sparkling water will find the spark run time.

How do you set a PySpark path?

Before starting PySpark, you need to set the following environments to set the Spark path and the Py4j path. Or, to set the above environments globally, put them in the . bashrc file. Then run the following command for the environments to work.


1 Answers

Two environment variables are required:

SPARK_HOME=/spark PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-VERSION-src.zip:$PYTHONPATH 
like image 59
ChromeHearts Avatar answered Sep 23 '22 07:09

ChromeHearts