Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can num-executors override dynamic allocation in spark-submit

Can specifying num-executors in spark-submit command override alreay enabled dynamic allocation (spark.dynamicAllocation.enable true) ?

like image 315
Arvind Kumar Avatar asked Jan 20 '18 05:01

Arvind Kumar


People also ask

What is dynamic allocation in Spark submit?

Dynamic Resource Allocation. Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand.

What is NUM executors in Spark?

YARN: The --num-executors option to the Spark YARN client controls how many executors it will allocate on the cluster ( spark. executor. instances as configuration property), while --executor-memory ( spark. executor. memory configuration property) and --executor-cores ( spark.

How do I turn off dynamic allocation Spark?

To disable Dynamic Allocation, set spark. dynamicAllocation. enabled to false . You can also specify the upper and lower bound of the resources that should be allocated to your application.


Video Answer


2 Answers

You can see from log:

INFO util.Utils: Using initial executors = 60, 
max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances

That means spark will take the max(spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors, spark.executor.instances)

spark.executor.instances is --num-executor.

like image 120
GodBlessYou Avatar answered Oct 22 '22 02:10

GodBlessYou


In your spark-defaults.conf file you can set the following to control the behaviour of dynamic allocation on Spark2

spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.initialExecutors=1
spark.dynamicAllocation.minExecutors=1
spark.dynamicAllocation.maxExecutors=5

If your spark2-submit command does not specify anything then your job starts with 1 executor and increases to 5 if required.

If your spark2-submit command specifies the following

--num-executors=3

then your job will start with 3 executors and still grow to 5 executors if required.

Check your log messages for

Using initial executors = [initialExecutors], max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances

Additionally if do not specify spark.dynamicAllocation.maxExecutors at all then, given a resource hungry job, it will continue to use as many executors as it can (in the case of Yarn this could be restricted by a limit defined on the Queue you submitted your job to). I have seen "rogue" spark jobs on Yarn hog huge amounts of resource on shared clusters starving other jobs. Your Yarn administrators should prevent resource starvation, etc by configuring sensible defaults and splitting different types of work loads across different queues.

I would advise performance testing any changes you intend to make in overriding the defaults, particularly trying to simulate busy periods of your system.

like image 31
Brad Avatar answered Oct 22 '22 02:10

Brad