I have a mix of Spark versions (1.6, 2.0, 2.1) all deployed on YARN (Hadoop 2.6.0 / CDH 5.5). I'm trying to guarantee that a certain application will never be starved of resources on our YARN cluster, regardless of what else may be running on there.
I've enabled the shuffle service and setup some Fair Scheduler Pools as described in the Spark docs. I created a separate pool for the high priority application I want never to be starved of resources, and gave it a minShare
of resources:
<?xml version="1.0"?>
<allocations>
<pool name="default">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>0</minShare>
</pool>
<pool name="high_priority">
<schedulingMode>FAIR</schedulingMode>
<weight>1</weight>
<minShare>24</minShare>
</pool>
</allocations>
When I run a Spark application on our YARN cluster, I can see that the pools I configured are recognized:
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool default, schedulingMode: FAIR, minShare: 0, weight: 1
17/04/04 11:38:20 INFO scheduler.FairSchedulableBuilder: Created pool high_priority, schedulingMode: FAIR, minShare: 24, weight: 1
However, I don't see that my application is using the new high_priority
pool, even though I am setting spark.scheduler.pool
in my call to spark-submit
. So that means when the cluster is pegged by regular activity, my high priority application is not getting the resources it needs:
17/04/04 11:39:49 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
17/04/04 11:39:50 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_0 tasks to pool default
17/04/04 11:39:50 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
17/04/04 11:40:05 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
What am I missing here? My coworkers and I tried enabling preemption in YARN, but that didn't do anything. And then we realized that there is a concept in YARN very similar to Spark scheduler pools called YARN queues. So now we're not sure if the two concepts conflict somehow.
How can we get our high priority pool to work as expected? Is there some kind of conflict between Spark scheduler pools and YARN queues?
Someone over on the spark-users list clarified something that explains why I'm not getting what I expect: Spark scheduler pools are for managing resources within an application, while YARN queues are for managing resources across applications. I need the latter and was mistakenly using the former.
This is explained in the Spark docs under Job Scheduling. I simply got bitten by careless reading plus a confusion of "job" in the Spark technical sense of the word (i.e. actions within a Spark application) and "job" as my coworkers and I commonly use it to mean an application submitted to the cluster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With