Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I set the driver's python version in spark?

I'm using spark 1.4.0-rc2 so I can use python 3 with spark. If I add export PYSPARK_PYTHON=python3 to my .bashrc file, I can run spark interactively with python 3. However, if I want to run a standalone program in local mode, I get an error:

Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions

How can I specify the version of python for the driver? Setting export PYSPARK_DRIVER_PYTHON=python3 didn't work.

like image 655
Kevin Avatar asked May 28 '15 22:05

Kevin


People also ask

How do I specify a Python version in spark submit?

You can set the PYSPARK_PYTHON variable in conf/spark-env.sh (in Spark's installation directory) to the absolute path of the desired Python executable.

Which version of Python is compatible with spark?

Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.


1 Answers

Setting both PYSPARK_PYTHON=python3 and PYSPARK_DRIVER_PYTHON=python3 works for me.

I did this using export in my .bashrc. In the end, these are the variables I create:

export SPARK_HOME="$HOME/Downloads/spark-1.4.0-bin-hadoop2.4" export IPYTHON=1 export PYSPARK_PYTHON=/usr/bin/python3 export PYSPARK_DRIVER_PYTHON=ipython3 export PYSPARK_DRIVER_PYTHON_OPTS="notebook" 

I also followed this tutorial to make it work from within Ipython3 notebook: http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/

like image 122
fccoelho Avatar answered Oct 05 '22 13:10

fccoelho