I'm using spark 1.4.0-rc2 so I can use python 3 with spark. If I add export PYSPARK_PYTHON=python3
to my .bashrc
file, I can run spark interactively with python 3. However, if I want to run a standalone program in local mode, I get an error:
Exception: Python in worker has different version 3.4 than that in driver 2.7, PySpark cannot run with different minor versions
How can I specify the version of python for the driver? Setting export PYSPARK_DRIVER_PYTHON=python3
didn't work.
You can set the PYSPARK_PYTHON variable in conf/spark-env.sh (in Spark's installation directory) to the absolute path of the desired Python executable.
Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.
Setting both PYSPARK_PYTHON=python3
and PYSPARK_DRIVER_PYTHON=python3
works for me.
I did this using export in my .bashrc
. In the end, these are the variables I create:
export SPARK_HOME="$HOME/Downloads/spark-1.4.0-bin-hadoop2.4" export IPYTHON=1 export PYSPARK_PYTHON=/usr/bin/python3 export PYSPARK_DRIVER_PYTHON=ipython3 export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
I also followed this tutorial to make it work from within Ipython3 notebook: http://ramhiser.com/2015/02/01/configuring-ipython-notebook-support-for-pyspark/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With