I'm reading "Hadoop The Definitive Guide" of 4th edition, and came across this explanation for YARN'S DRF (in Chapter 4, Dominant Resource Fairness)
Imagine a cluster with a total of 100 CPUs and 10 TB of memory. Application A requests containers of (2 CPUs, 300 GB), and application B requests containers of (6 CPUs, 100 GB). A’s request is (2%, 3%) of the cluster, so memory is dominant since its proportion (3%) is larger than CPU’s (2%). B’s request is (6%, 1%), so CPU is dominant. Since B’s container requests are twice as big in the dominant resource (6% versus 3%), it will be allocated half as many containers under fair sharing.
I cannot understand the meaning of it will be allocated half as many containers under fair sharing
. I guess it
here is Application B
, and Application B
is allocated half of the number of Application A's containers. Is it right? Why is Application B
allocated smaller containers even when it requires more resources?
Any suggestion and indication to some explanation document would be appreciated so much. Thank you in advance.
Dominant Resource Fairness (DRF) DRF uses the concept of the dominant resource to compare multi-dimensional resources.
YARN defines a minimum allocation and a maximum allocation for the resources it is scheduling for: Memory and/or Cores today. Each server running a worker for YARN has a NodeManager that is providing an allocation of resources which could be memory and/or cores that can be used for scheduling.
The CapacityScheduler is designed to allow sharing a large cluster while giving each organization a minimum capacity guarantee. The central idea is that the available resources in the Hadoop Map-Reduce cluster are partitioned among multiple organizations who collectively fund the cluster based on computing needs.
Dominant Resource Calculator is based on concept of Dominant Resource Fairness (DRF).
To understand DRF, you can refer to the paper here: https://people.eecs.berkeley.edu/~alig/papers/drf.pdf
In this paper, refer to section 4.1, where an example is given.
DRF tries to equalise the dominant shares (Memory requirements of A = CPU requirements of B).
Explanation
Total Resouces Available
: 100 CPUs, 10000 GB Memory
Requirements of Application A
: 2 CPUs, 300 GB Memory
Requirements of Application B
: 6 CPUs, 100 GB Memory
A's dominant resource is Memory
(2% of CPUs vs 3% of Memory)
B's dominant resource is CPU
(6% of CPUs vs 1% of Memory)
Let's assume that "A" is assigned x
containers and "B" is assigned y
containers.
Resource requirements of A
2x CPUs + 300x GB Memory (2 CPUs and 300 GB Memory for each container)
Resource requirements of B:
6y CPUs + 100y GB Memory (6 CPUs and 100 GB Memory for each container)
Total requirement is:
2x + 6y <= 100 CPUs
300x + 100y <= 10000 GB Memory
DRF will try to equalise the dominant needs of A and B.
A's dominant need: 300x / 10000 GB (300x out of 10000 GB of total memory)
B's dominant need: 6y / 100 CPUs (6y out of 100 CPUs)
DRF will try to equalise: (300x / 10000) = (6y / 100)
Solving the above equation gives: x = 2y
If you substitute x = 2y
and solve the equations in step 3, you will get x=20 and y=10.
It means:
Application A is allocated 20 containers: (40 CPUs, 6000 GB of Memory)
Application B is allocated 10 containers: (60 CPUs, 1000 GB of memoty)
You can see that:
Total allocated CPU is:
40 + 60 <= 100 CPUs available
Total allocated Memory is:
6000 + 1000 <= 10000 GB of Memory available
So, the above solution explains the meaning of the sentence:
Since B’s container requests are twice as big in the dominant resource (6%
versus 3%), it will be allocated half as many containers under fair sharing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With