Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass environment variables to spark driver in cluster mode with spark-submit

spark-submit allows to configure the executor environment variables with --conf spark.executorEnv.FOO=bar, and the Spark REST API allows to pass some environment variables with the environmentVariables field. Unfortunately I've found nothing similar to configure the environment variable of the driver when submitting the driver with spark-submit in cluster mode:

spark-submit --deploy-mode cluster myapp.jar

Is it possible to set the environment variables of the driver with spark-submit in cluster mode?

like image 837
Gaëtan Lehmann Avatar asked Jun 17 '16 17:06

Gaëtan Lehmann


People also ask

How do I set environment variables in Spark?

Spark Configuration Spark properties control most application parameters and can be set by using a SparkConf object, or through Java system properties. Environment variables can be used to set per-machine settings, such as the IP address, through the conf/spark-env.sh script on each node.

How do I run Spark submit in cluster mode?

You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.

Can we run the Spark submit in local mode in cluster?

No, the spark-submit parameters num-executors , executor-cores , executor-memory won't work in local mode because these parameters are to be used when you deploy your spark job on a cluster and not a single machine, these will only work in case you run your job in client or cluster mode.


2 Answers

On YARN at least, this works:

spark-submit --deploy-mode cluster --conf spark.yarn.appMasterEnv.FOO=bar myapp.jar


It's mentioned in http://spark.apache.org/docs/latest/configuration.html#environment-variables that:

Note: When running Spark on YARN in cluster mode, environment variables need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file.

I have tested that it can be passed with --conf flag for spark-submit, so that you don't have to edit global conf files.

like image 171
juhoautio Avatar answered Sep 18 '22 05:09

juhoautio


On Yarn in cluster mode, it worked by adding the environment variables in the spark-submit command using --conf as below-

spark-submit --master yarn-cluster --num-executors 15 --executor-memory 52g --executor-cores 7 --driver-memory 52g --conf "spark.yarn.appMasterEnv.FOO=/Path/foo" --conf "spark.executorEnv.FOO2=/path/foo2" app.jar

Also, you can do it by adding them in conf/spark-defaults.conf file.

like image 37
Jyoti Gupta Avatar answered Sep 18 '22 05:09

Jyoti Gupta