Can specifying num-executors in spark-submit command override alreay enabled dynamic allocation (spark.dynamicAllocation.enable true) ?

You can see from log: <pre class="prettyprint"><code>INFO util.Utils: Using initial executors = 60, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances </code></pre> That means spark will take the max(spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors, spark.executor.instances) spark.executor.instances is --num-executor.

Can num-executors override dynamic allocation in spark-submit

Video Answer

2 Answers

You can see from log:

INFO util.Utils: Using initial executors = 60, 
max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances

That means spark will take the max(spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors, spark.executor.instances)

spark.executor.instances is --num-executor.

120

answered Oct 22 '22 02:10

GodBlessYou

In your spark-defaults.conf file you can set the following to control the behaviour of dynamic allocation on Spark2

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.initialExecutors=1
spark.dynamicAllocation.minExecutors=1
spark.dynamicAllocation.maxExecutors=5

If your spark2-submit command does not specify anything then your job starts with 1 executor and increases to 5 if required.

If your spark2-submit command specifies the following

--num-executors=3

then your job will start with 3 executors and still grow to 5 executors if required.

Check your log messages for

Using initial executors = [initialExecutors], max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances

Additionally if do not specify spark.dynamicAllocation.maxExecutors at all then, given a resource hungry job, it will continue to use as many executors as it can (in the case of Yarn this could be restricted by a limit defined on the Queue you submitted your job to). I have seen "rogue" spark jobs on Yarn hog huge amounts of resource on shared clusters starving other jobs. Your Yarn administrators should prevent resource starvation, etc by configuring sensible defaults and splitting different types of work loads across different queues.

I would advise performance testing any changes you intend to make in overriding the defaults, particularly trying to simulate busy periods of your system.

answered Oct 22 '22 02:10

Brad

Related questions
                            
                                How to interpret probability column in spark logistic regression prediction?
                            
                                How to specify the location of custom log4j.configuration when spark-submit to Amazon EMR?
                            
                                Unbounded table is spark structured streaming
                            
                                Visualizing topics with Spark LDA
                            
                                R - How to replicate rows in a spark dataframe using sparklyr
                            
                                Scala - How to split the probability column (column of vectors) that we obtain when we fit the GMM model to the data in to two separate columns? [duplicate]
                            
                                How does Spark SQL read compressed csv files?
                            
                                S3A: fails while S3: works in Spark EMR
                            
                                with pyspark.sql.functions unix_timestamp get null
                            
                                Streaming data store in hive using spark
                            
                                How can I include additional jars when starting a Google DataProc cluster to use with Jupyter notebooks?
                            
                                reuse the result of a select expression in the "GROUP BY" clause?
                            
                                Spark DataFrame operators (nunique, multiplication)
                            
                                Is it possible to print definition of a function in Scala
                            
                                read/write dynamo db from apache spark [closed]
                            
                                java.lang.IllegalArgumentException: Invalid lambda deserialization
                            
                                Pyspark Dataframe - Map Strings to Numerics
                            
                                After installing sparknlp, cannot import sparknlp
                            
                                How to achieve dynamic load-balancing of tasks in Apache Spark
                            
                                How to calculate the power of 2 for the column of DataFrame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can num-executors override dynamic allocation in spark-submit

Tags:

apache-spark

spark-submit

Arvind Kumar

People also ask

Video Answer

2 Answers

GodBlessYou

Brad

Recent Activity

Donate For Us