Pyspark append executor environment variable

Tags:

Is it possible to append a value to the PYTHONPATH of a worker in spark?

I know it is possible to go to each worker node, configure spark-env.sh file and do it, but I want a more flexible approach

I am trying to use setExecutorEnv method, but with no success

conf = SparkConf().setMaster("spark://192.168.10.11:7077")\
              .setAppName(''myname')\
              .set("spark.cassandra.connection.host", "192.168.10.11") /
              .setExecutorEnv('PYTHONPATH', '$PYTHONPATH:/custom_dir_that_I_want_to_append/')

It creates a pythonpath env.variable on each executor, force it to be lower_case, and does not interprets $PYTHONPATH command to append the value.

I end up with two different env.variables,

pythonpath  :  $PYTHONPATH:/custom_dir_that_I_want_to_append
PYTHONPATH  :  /old/path/to_python

The first one is dynamically created and the second one already existed before.

Does anyone know how to do it?

378

asked Nov 25 '16 15:11

guilhermecgs

1 Answers

I figured out myself...

The problem is not with spark, but in ConfigParser

Based on this answer, I fixed the ConfigParser to always preserve case.

After this, I found out that the default spark behavior is to append the values to existing worker env.variables, if there is a env.variable with the same name.

So, it is not necessary to mention $PYTHONPATH within dollar sign.

.setExecutorEnv('PYTHONPATH', '/custom_dir_that_I_want_to_append/')

answered Nov 15 '22 07:11

guilhermecgs

Related questions
                            
                                Spark not leveraging hdfs partitioning with parquet
                            
                                Efficiency of flatMap vs map followed by reduce in Spark
                            
                                How access individual element in a tuple on a RDD in pyspark?
                            
                                Can a model be created on Spark batch and use it in Spark streaming?
                            
                                How to save RandomForestClassifier Spark model in scala?
                            
                                How can I declare a Column as a categorical feature in a DataFrame for use in ml
                            
                                Passing Python functions as objects to Spark
                            
                                How to run spark shell with *local* packages?
                            
                                Spark shows different number of cores than what is passed to it using spark-submit
                            
                                Convert GraphFrames ShortestPath Map into DataFrame rows in PySpark
                            
                                'Symbol lookup error' with netlib-java
                            
                                Spark Streaming from Kafka Consumer
                            
                                Spark explode nested JSON with Array in Scala
                            
                                Spark: out of memory when broadcasting objects
                            
                                What type should I declare a DateTime object in a scala class constructor?
                            
                                aggregate Dataframe pyspark
                            
                                Registering Hive Custom UDF with Spark (Spark SQL) 2.0.0
                            
                                How to read and write data in Google Cloud Bigtable in PySpark application?
                            
                                How to Connect Python to Spark Session and Keep RDDs Alive
                            
                                SparkContext class not found error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pyspark append executor environment variable

Tags:

pythonpath

apache-spark

pyspark

guilhermecgs

People also ask

1 Answers

guilhermecgs

Recent Activity

Donate For Us