How to execute Spark programs with Dynamic Resource Allocation?

Tags:

I am using spark-summit command for executing Spark jobs with parameters such as:

spark-submit --master yarn-cluster --driver-cores 2 \
 --driver-memory 2G --num-executors 10 \
 --executor-cores 5 --executor-memory 2G \
 --class com.spark.sql.jdbc.SparkDFtoOracle2 \
 Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Now i want to execute the same program using Spark's Dynamic Resource allocation. Could you please help with the usage of Dynamic Resource Allocation in executing Spark programs.

456

asked Oct 23 '16 06:10

Arvind Kumar

1 Answers

In Spark dynamic allocation spark.dynamicAllocation.enabled needs to be set to true because it's false by default.

This requires spark.shuffle.service.enabled to be set to true, as spark application is running on YARN. Check this link to start the shuffle service on each NodeManager in YARN.

The following configurations are also relevant:

spark.dynamicAllocation.minExecutors, 
spark.dynamicAllocation.maxExecutors, and 
spark.dynamicAllocation.initialExecutors

These options can be configured to Spark application in 3 ways

1. From Spark submit with --conf <prop_name>=<prop_value>

spark-submit --master yarn-cluster \
    --driver-cores 2 \
    --driver-memory 2G \
    --num-executors 10 \
    --executor-cores 5 \
    --executor-memory 2G \
    --conf spark.dynamicAllocation.minExecutors=5 \
    --conf spark.dynamicAllocation.maxExecutors=30 \
    --conf spark.dynamicAllocation.initialExecutors=10 \ # same as --num-executors 10
    --class com.spark.sql.jdbc.SparkDFtoOracle2 \
    Spark-hive-sql-Dataframe-0.0.1-SNAPSHOT-jar-with-dependencies.jar

2. Inside Spark program with SparkConf

Set the properties in SparkConf then create SparkSession or SparkContext with it

val conf: SparkConf = new SparkConf()
conf.set("spark.dynamicAllocation.minExecutors", "5");
conf.set("spark.dynamicAllocation.maxExecutors", "30");
conf.set("spark.dynamicAllocation.initialExecutors", "10");
.....

3. spark-defaults.conf usually located in $SPARK_HOME/conf/

Place the same configurations in spark-defaults.conf to apply for all spark applications if no configuration is passed from command-line as well as code.

Spark - Dynamic Allocation Confs

102

answered Oct 14 '22 14:10

mrsrinivas

Related questions
                            
                                How to evaluate a classifier with PySpark 2.4.5
                            
                                How to set preferences for ALS implicit feedback in Collaborative Filtering?
                            
                                Spark execution memory monitoring [closed]
                            
                                Writing more than 50 millions from Pyspark df to PostgresSQL, best efficient approach
                            
                                Spark: Writing to Avro file
                            
                                Apache Spark: pyspark crash for large dataset
                            
                                Understanding Spark's closures and their serialization
                            
                                apache spark MLLib: how to build labeled points for string features?
                            
                                How to suppress parquet log messages in Spark?
                            
                                Apache spark: setting spark.eventLog.enabled and spark.eventLog.dir at submit or Spark start
                            
                                How to create Spark RDD from an iterator?
                            
                                How does Apache Spark know about HDFS data nodes?
                            
                                Apache Spark throws NullPointerException when encountering missing feature
                            
                                In Spark, what is the right way to have a static object on all workers?
                            
                                Spark DataFrame Schema Nullable Fields
                            
                                Coalesce reduces parallelism of entire stage (spark)
                            
                                How to use java.time.LocalDate in Datasets (fails with java.lang.UnsupportedOperationException: No Encoder found)? [duplicate]
                            
                                Saving dataframe to local file system results in empty results
                            
                                Does groupByKey in Spark preserve the original order?
                            
                                Spark on Amazon EMR: "Timeout waiting for connection from pool"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to execute Spark programs with Dynamic Resource Allocation?

Tags:

apache-spark

hadoop

hadoop-yarn

Arvind Kumar

People also ask

1 Answers

mrsrinivas

Recent Activity

Donate For Us