Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark: How to use pyspark with Python 3

I built Spark 1.4 from the GH development master, and the build went through fine. But when I do a bin/pyspark I get the Python 2.7.9 version. How can I change this?

like image 481
tchakravarty Avatar asked May 16 '15 19:05

tchakravarty


People also ask

Does PySpark work with Python 3?

The current version of PySpark is 2.4. 3 and works with Python 2.7, 3.3, and above.

Which Python version is best for PySpark?

PySpark requires Java version 7 or later and Python version 2.6 or later.

Do I need to install Spark before PySpark?

PySpark is a Spark library written in Python to run Python applications using Apache Spark capabilities. so there is no PySpark library to download. All you need is Spark.


3 Answers

Just set the environment variable:

export PYSPARK_PYTHON=python3

in case you want this to be a permanent change add this line to pyspark script.

like image 53
Rtik88 Avatar answered Oct 17 '22 22:10

Rtik88


PYSPARK_PYTHON=python3 
./bin/pyspark

If you want to run in in IPython Notebook, write:

PYSPARK_PYTHON=python3 
PYSPARK_DRIVER_PYTHON=ipython 
PYSPARK_DRIVER_PYTHON_OPTS="notebook" 
./bin/pyspark

If python3 is not accessible, you need to pass path to it instead.

Bear in mind that the current documentation (as of 1.4.1) has outdate instructions. Fortunately, it has been patched.

like image 35
Piotr Migdal Avatar answered Oct 17 '22 22:10

Piotr Migdal


1,edit profile :vim ~/.profile

2,add the code into the file: export PYSPARK_PYTHON=python3

3, execute command : source ~/.profile

4, ./bin/pyspark

like image 9
yangh Avatar answered Oct 17 '22 23:10

yangh