I'm trying to setup pyspark on my desktop and interact with it via the terminal. I'm following this guide,
http://jmedium.com/pyspark-in-python/
When I run 'pyspark' in the terminal is says,
/home/jacob/spark-2.1.0-bin-hadoop2.7/bin/pyspark: line 45: python:
command not found
env: ‘python’: No such file or directory
I've followed several guides which all lead to this same issue (some have different details on setting up the .profile. Thus far none have worked correctly). I have java, python3.6, and Scala installed. My .profile is configured as follows:
#Spark and PySpark Setup
PATH="$HOME/bin:$HOME/.local/bin:$PATH"
export SPARK_HOME='/home/jacob/spark-2.1.0-bin-hadoop2.7'
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
#export PYSPARK_DRIVER_PYTHON="jupyter"
#export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=python3.6.5
Note that jupyter notebook is commented out because I want to launch pyspark in the shell right now with out the notebook starting
Interestingly spark-shell launches just fine
I'm using Ubuntu 18.04.1 and Spark 2.1
See Images
I've tried every guide I can find, and since this is my first time setting up Spark i'm not sure how to troubleshoot it from here
Thank you
Attempting to execute pyspark
.profile
versions
You should have set export PYSPARK_PYTHON=python3
instead of export PYSPARK_PYTHON=python3.6.5
in your .profile
then source .profile
, of course.
That's worked for me.
other options, installing sudo apt python
(which is for 2.x ) is not appropriate.
For those who may come across this, I figured it out!
I specifically chose to use an older version of Spark in order to follow along with a tutorial I was watching - Spark 2.1.0. I did not know that the latest version of Python (3.5.6 at the time of writing this) is incompatible with Spark 2.1. Thus PySpark would not launch.
I solved this by using Python 2.7 and setting the path accordingly in .bashrc
export PYTHONPATH=$PYTHONPAH:/usr/lib/python2.7
export PYSPARK_PYTHON=python2.7
People using python 3.8 and Spark <= 2.4.5 will have the same problem.
In this case, the only solution I found is to update spark to V 3.0.0.
Look at https://bugs.python.org/issue38775
for GNU/Linux users that have python3 package installed (ubuntu/debian distro's specially) you can find a package called "python-is-python3" this would help identifying python3 as python command.
# apt install python-is-python3
python 2.7 is deprecated now (2020 ubuntu 20.10) so do not try installing it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With