Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set python path for Spark worker

What's the "correct" way to set the sys path for Python worker node?

Is it a good idea for worker nodes to "inherit" sys path from master?

Is it a good idea to set the path in the worker nodes' through .bashrc? Or is there some standard Spark way of setting it?

like image 999
user3240688 Avatar asked Oct 06 '15 00:10

user3240688


3 Answers

A standard way of setting environmental variables, including PYSPARK_PYTHON, is to use conf/spark-env.sh file. Spark comes with a template file (conf/spark-env.sh.template) which explains the most common options.

It is a normal bash script so you can use it the same way as you would with .bashrc

You'll find more details in a Spark Configuration Guide.

like image 118
zero323 Avatar answered Sep 26 '22 03:09

zero323


By the following code you can change the python path only for the current job, which also allow different python path for driver and executors:

    PYSPARK_DRIVER_PYTHON=/home/user1/anaconda2/bin/python PYSPARK_PYTHON=/usr/local/anaconda2/bin/python pyspark --master ..
like image 32
Peter Pan Avatar answered Sep 26 '22 03:09

Peter Pan


You may do either of the below -

In config,

Update SPARK_HOME/conf/spark-env.sh, add below lines:

# for pyspark
export PYSPARK_PYTHON="path/to/python"
# for driver, defaults to PYSPARK_PYTHON
export PYSPARK_DRIVER_PYTHON="path/to/python"

OR

In the code, add:

import os
# Set spark environments
os.environ['PYSPARK_PYTHON'] = 'path/to/python'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'path/to/python'
like image 35
Ani Menon Avatar answered Sep 24 '22 03:09

Ani Menon