I've been running my spark jobs in "client" mode during development. I use "--file" to share config files with executors. Driver was reading config files locally. Now I want to deploy the job in "cluster" mode. I'm having difficulty sharing the config files with driver now.
Ex, I'm passing the config file name as extraJavaOptions to both driver and executors. I'm reading the file using SparkFiles.get()
val configFile = org.apache.spark.SparkFiles.get(System.getProperty("config.file.name"))
This works well on the executors but fails on driver. I think the files are only shared with executors and not with the container where driver is running. One option is to keep the config files in S3. I wanted to check if this can be achieved using spark-submit.
> spark-submit --deploy-mode cluster --master yarn --driver-cores 2
> --driver-memory 4g --num-executors 4 --executor-cores 4 --executor-memory 10g \
> --files /home/hadoop/Streaming.conf,/home/hadoop/log4j.properties \
> --conf **spark.driver.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --conf **spark.executor.extraJavaOptions**="-Dlog4j.configuration=log4j.properties
> -Dconfig.file.name=Streaming.conf" \
> --class ....
You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.
Using --deploy-mode , you specify where to run the PySpark application driver program. Spark support cluster and client deployment modes. In cluster mode, the driver runs on one of the worker nodes, and this node shows as a driver on the Spark Web UI of your application. cluster mode is used to run production jobs.
Using spark-submit --deploy-mode <client/cluster> , you can specify where to run the Spark application driver program. In cluster mode, the driver runs on one of the worker nodes, and this node shows as a driver on the Spark Web UI of your application. cluster mode is used to run production jobs.
Difference between Client vs Cluster deploy modes in Spark/PySpark is the most asked interview question – Spark deployment mode (--deploy-mode) specifies where to run the driver program of your Spark application/job, Spark provides two deployment modes, client and cluster, you could use these to run Java, Scala, and PySpark applications.
Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager, Mesos, YARN or Kubernetes), which allocate resources across applications.
The spark-submit script can load default Spark configuration values from a properties file and pass them on to your application. By default, it will read options from conf/spark-defaults.conf in the Spark directory. For more detail, see the section on loading default configurations.
You need to try the --properties-file
option in Spark submit command.
For example properties file content
spark.key1=value1
spark.key2=value2
All the keys needs to be prefixed
with spark
.
then use the spark-submit command like this to pass the properties file.
bin/spark-submit --properties-file propertiesfile.properties
Then in the code you can get the keys using below sparkcontext getConf
method.
sc.getConf.get("spark.key1") // returns value1
Once you get the key values, you can pass use it everywhere.
I found a solution for this problem in this thread.
You can give an alias for the file you submitted through --files by adding '#alias' at the end. By this trick, you should be able to access the files through their alias.
For example, the following code can run without an error.
spark-submit --master yarn-cluster --files test.conf#testFile.conf test.py
with test.py as:
path_f = 'testFile.conf'
try:
f = open(path_f, 'r')
except:
raise Exception('File not opened', 'EEEEEEE!')
and an empty test.conf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With