What's the "correct" way to set the sys path for Python worker node?
Is it a good idea for worker nodes to "inherit" sys path from master?
Is it a good idea to set the path in the worker nodes' through .bashrc
? Or is there some standard Spark way of setting it?
A standard way of setting environmental variables, including PYSPARK_PYTHON
, is to use conf/spark-env.sh
file. Spark comes with a template file (conf/spark-env.sh.template
) which explains the most common options.
It is a normal bash script so you can use it the same way as you would with .bashrc
You'll find more details in a Spark Configuration Guide.
By the following code you can change the python path only for the current job, which also allow different python path for driver and executors:
PYSPARK_DRIVER_PYTHON=/home/user1/anaconda2/bin/python PYSPARK_PYTHON=/usr/local/anaconda2/bin/python pyspark --master ..
You may do either of the below -
In config,
Update SPARK_HOME/conf/spark-env.sh
, add below lines:
# for pyspark
export PYSPARK_PYTHON="path/to/python"
# for driver, defaults to PYSPARK_PYTHON
export PYSPARK_DRIVER_PYTHON="path/to/python"
OR
In the code, add:
import os
# Set spark environments
os.environ['PYSPARK_PYTHON'] = 'path/to/python'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'path/to/python'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With