Set python path for Spark worker

Question

What's the "correct" way to set the sys path for Python worker node?

Is it a good idea for worker nodes to "inherit" sys path from master?

Is it a good idea to set the path in the worker nodes' through .bashrc? Or is there some standard Spark way of setting it?

zero323 · Accepted Answer

A standard way of setting environmental variables, including PYSPARK_PYTHON, is to use conf/spark-env.sh file. Spark comes with a template file (conf/spark-env.sh.template) which explains the most common options.

It is a normal bash script so you can use it the same way as you would with .bashrc

You'll find more details in a Spark Configuration Guide.

Peter Pan · Answer

By the following code you can change the python path only for the current job, which also allow different python path for driver and executors:

    PYSPARK_DRIVER_PYTHON=/home/user1/anaconda2/bin/python PYSPARK_PYTHON=/usr/local/anaconda2/bin/python pyspark --master ..

Ani Menon · Answer

You may do either of the below -

In config,

Update SPARK_HOME/conf/spark-env.sh, add below lines:

# for pyspark
export PYSPARK_PYTHON="path/to/python"
# for driver, defaults to PYSPARK_PYTHON
export PYSPARK_DRIVER_PYTHON="path/to/python"

OR

In the code, add:

import os
# Set spark environments
os.environ['PYSPARK_PYTHON'] = 'path/to/python'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'path/to/python'

Set python path for Spark worker

Tags:

apache-spark

pyspark

user3240688

3 Answers

zero323

Peter Pan

Ani Menon

Recent Activity

Donate For Us

Set python path for Spark worker

Tags:

apache-spark

pyspark

user3240688

3 Answers

zero323

Peter Pan

Ani Menon

Related questions

Recent Activity

Donate For Us