Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify the version of Python for spark-submit to use?

I have two versions of Python. When I launch a spark application using spark-submit, the application uses the default version of Python. But, I want to use the other one. How to specify the version of Python for spark-submit to use?

like image 947
A7med Avatar asked Apr 30 '15 16:04

A7med


People also ask

Which version of Python is compatible with spark?

Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.

Which Python version is best for PySpark?

PySpark requires Java version 7 or later and Python version 2.6 or later.


1 Answers

You can set the PYSPARK_PYTHON variable in conf/spark-env.sh (in Spark's installation directory) to the absolute path of the desired Python executable.

Spark distribution contains spark-env.sh.template (spark-env.cmd.template on Windows) by default. It must be renamed to spark-env.sh (spark-env.cmd) first.

For example, if Python executable is installed under /opt/anaconda3/bin/python3:

PYSPARK_PYTHON='/opt/anaconda3/bin/python3' 

Check out the configuration documentation for more information.

like image 70
Benjamin Rowell Avatar answered Sep 23 '22 13:09

Benjamin Rowell