spark-submit
allows to configure the executor environment variables with --conf spark.executorEnv.FOO=bar
, and the Spark REST API allows to pass some environment variables with the environmentVariables
field.
Unfortunately I've found nothing similar to configure the environment variable of the driver when submitting the driver with spark-submit
in cluster mode:
spark-submit --deploy-mode cluster myapp.jar
Is it possible to set the environment variables of the driver with spark-submit
in cluster mode?
Spark Configuration Spark properties control most application parameters and can be set by using a SparkConf object, or through Java system properties. Environment variables can be used to set per-machine settings, such as the IP address, through the conf/spark-env.sh script on each node.
You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.
No, the spark-submit parameters num-executors , executor-cores , executor-memory won't work in local mode because these parameters are to be used when you deploy your spark job on a cluster and not a single machine, these will only work in case you run your job in client or cluster mode.
On YARN at least, this works:
spark-submit --deploy-mode cluster --conf spark.yarn.appMasterEnv.FOO=bar myapp.jar
It's mentioned in http://spark.apache.org/docs/latest/configuration.html#environment-variables that:
Note: When running Spark on YARN in
cluster
mode, environment variables need to be set using thespark.yarn.appMasterEnv.[EnvironmentVariableName]
property in yourconf/spark-defaults.conf
file.
I have tested that it can be passed with --conf
flag for spark-submit
, so that you don't have to edit global conf files.
On Yarn in cluster mode, it worked by adding the environment variables in the spark-submit command using --conf as below-
spark-submit --master yarn-cluster --num-executors 15 --executor-memory 52g --executor-cores 7 --driver-memory 52g --conf "spark.yarn.appMasterEnv.FOO=/Path/foo" --conf "spark.executorEnv.FOO2=/path/foo2" app.jar
Also, you can do it by adding them in conf/spark-defaults.conf file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With