I wanted to install pyspark
on my home machine. I did
pip install pyspark
pip install jupyter
Both seemed to work well.
But when I try to run pyspark
I get
pyspark
Could not find valid SPARK_HOME while searching ['/home/user', '/home/user/.local/bin']
What should SPARK_HOME
be set to?
Go to the Spark Installation directory from the command line and type bin/pyspark and press enter, this launches pyspark shell and gives you a prompt to interact with Spark in Python language. If you have set the Spark in a PATH then just enter pyspark in command line or terminal (mac users).
For Python users, PySpark also provides pip installation from PyPI. This is usually for local usage or as a client to connect to a cluster instead of setting up a cluster itself. This page includes instructions for installing PySpark by using pip, Conda, downloading manually, and building from the source.
To test if your installation was successful, open Command Prompt, change to SPARK_HOME directory and type bin\pyspark. This should start the PySpark shell which can be used to interactively work with Spark.
I just faced the same issue, but it turned out that pip install pyspark
downloads spark distirbution that works well in local mode. Pip just doesn't set appropriate SPARK_HOME
. But when I set this manually, pyspark works like a charm (without downloading any additional packages).
$ pip3 install --user pyspark
Collecting pyspark
Downloading pyspark-2.3.0.tar.gz (211.9MB)
100% |████████████████████████████████| 211.9MB 9.4kB/s
Collecting py4j==0.10.6 (from pyspark)
Downloading py4j-0.10.6-py2.py3-none-any.whl (189kB)
100% |████████████████████████████████| 194kB 3.9MB/s
Building wheels for collected packages: pyspark
Running setup.py bdist_wheel for pyspark ... done
Stored in directory: /home/mario/.cache/pip/wheels/4f/39/ba/b4cb0280c568ed31b63dcfa0c6275f2ffe225eeff95ba198d6
Successfully built pyspark
Installing collected packages: py4j, pyspark
Successfully installed py4j-0.10.6 pyspark-2.3.0
$ PYSPARK_PYTHON=python3 SPARK_HOME=~/.local/lib/python3.5/site-packages/pyspark pyspark
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
2018-03-31 14:02:39 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.0
/_/
Using Python version 3.5.2 (default, Nov 23 2017 16:37:01)
>>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With