Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark Will not start - ‘python’: No such file or directory

I'm trying to setup pyspark on my desktop and interact with it via the terminal. I'm following this guide,

http://jmedium.com/pyspark-in-python/

When I run 'pyspark' in the terminal is says,

/home/jacob/spark-2.1.0-bin-hadoop2.7/bin/pyspark: line 45: python:
command not found
env: ‘python’: No such file or directory

I've followed several guides which all lead to this same issue (some have different details on setting up the .profile. Thus far none have worked correctly). I have java, python3.6, and Scala installed. My .profile is configured as follows:

#Spark and PySpark Setup
PATH="$HOME/bin:$HOME/.local/bin:$PATH"
export SPARK_HOME='/home/jacob/spark-2.1.0-bin-hadoop2.7'
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
#export PYSPARK_DRIVER_PYTHON="jupyter"
#export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=python3.6.5

Note that jupyter notebook is commented out because I want to launch pyspark in the shell right now with out the notebook starting

Interestingly spark-shell launches just fine

I'm using Ubuntu 18.04.1 and Spark 2.1

See Images

I've tried every guide I can find, and since this is my first time setting up Spark i'm not sure how to troubleshoot it from here

Thank you

Attempting to execute pyspark

.profile

versions

like image 707
Cheddar Avatar asked Sep 06 '18 03:09

Cheddar


4 Answers

You should have set export PYSPARK_PYTHON=python3 instead of export PYSPARK_PYTHON=python3.6.5 in your .profile

then source .profile , of course.

That's worked for me.

other options, installing sudo apt python (which is for 2.x ) is not appropriate.

like image 86
Tansu Dasli Avatar answered Oct 19 '22 00:10

Tansu Dasli


For those who may come across this, I figured it out!

I specifically chose to use an older version of Spark in order to follow along with a tutorial I was watching - Spark 2.1.0. I did not know that the latest version of Python (3.5.6 at the time of writing this) is incompatible with Spark 2.1. Thus PySpark would not launch.

I solved this by using Python 2.7 and setting the path accordingly in .bashrc

export PYTHONPATH=$PYTHONPAH:/usr/lib/python2.7
export PYSPARK_PYTHON=python2.7
like image 43
Cheddar Avatar answered Oct 18 '22 22:10

Cheddar


People using python 3.8 and Spark <= 2.4.5 will have the same problem.

In this case, the only solution I found is to update spark to V 3.0.0.

Look at https://bugs.python.org/issue38775

like image 3
Javier Vargas Avatar answered Oct 18 '22 23:10

Javier Vargas


for GNU/Linux users that have python3 package installed (ubuntu/debian distro's specially) you can find a package called "python-is-python3" this would help identifying python3 as python command.

# apt install python-is-python3

python 2.7 is deprecated now (2020 ubuntu 20.10) so do not try installing it.

like image 1
damoonimani Avatar answered Oct 18 '22 23:10

damoonimani