I am using spark-summit command for executing Spark jobs with parameters such as:
spark-submit --master yarn-cluster --driver-cores 2 \
--driver-memory 2G --num-executors 10 \
--executor-cores 5 --executor-memory 2G \
--class com.spark.sql.jdbc.SparkDFtoOracle2 \
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
Now i want to execute the same program using Spark's Dynamic Resource allocation. Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.
Dynamic Resource Allocation. Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand.
To disable Dynamic Allocation, set spark. dynamicAllocation. enabled to false . You can also specify the upper and lower bound of the resources that should be allocated to your application.
In Spark dynamic allocation spark.dynamicAllocation.enabled
needs to be set to true
because it's false
by default.
This requires spark.shuffle.service.enabled
to be set to true
, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.
The following configurations are also relevant:
spark.dynamicAllocation.minExecutors,
spark.dynamicAllocation.maxExecutors, and
spark.dynamicAllocation.initialExecutors
These options can be configured to Spark application in 3 ways
1. From Spark submit with --conf <prop_name>=<prop_value>
spark-submit --master yarn-cluster \
--driver-cores 2 \
--driver-memory 2G \
--num-executors 10 \
--executor-cores 5 \
--executor-memory 2G \
--conf spark.dynamicAllocation.minExecutors=5 \
--conf spark.dynamicAllocation.maxExecutors=30 \
--conf spark.dynamicAllocation.initialExecutors=10 \ # same as --num-executors 10
--class com.spark.sql.jdbc.SparkDFtoOracle2 \
Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2. Inside Spark program with SparkConf
Set the properties in SparkConf
then create SparkSession
or SparkContext
with it
val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....
3. spark-defaults.conf
usually located in $SPARK_HOME/conf/
Place the same configurations in spark-defaults.conf
to apply for all spark applications if no configuration is passed from command-line as well as code.
Spark - Dynamic Allocation Confs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With