I have a tiny cluster composed of 1 master (namenode, secondarynamenode, resourcemanager) and 2 slaves (datanode, nodemanager).
I have set in the yarn-site.xml of the master :
yarn.scheduler.minimum-allocation-mb
: 512yarn.scheduler.maximum-allocation-mb
: 1024yarn.scheduler.minimum-allocation-vcores
: 1yarn.scheduler.maximum-allocation-vcores
: 2I have set in the yarn-site.xml of the slaves :
yarn.nodemanager.resource.memory-mb
: 2048 yarn.nodemanager.resource.cpu-vcores
: 4Then in the master, I have set in mapred-site.xml :
mapreduce.map.memory.mb
: 512mapreduce.map.java.opts
: -Xmx500mmapreduce.map.cpu.vcores
: 1mapreduce.reduce.memory.mb
: 512mapreduce.reduce.java.opts
: -Xmx500mmapreduce.reduce.cpu.vcores
: 1So it is my understanding that when running a job, the mapreduce ApplicationMaster will try to create as many containers of 512 Mb and 1 vCore on both slaves, which have only 2048 Mb and 4 vCores available each, which gives space for 4 containers on each slave. This is precisely what is happening on my jobs, so no problem so far.
However, when i increment the mapreduce.map.cpu.vcores
and mapreduce.reduce.cpu.vcores
from 1 to 2, there should theoretically be only enough vCores available for creating 2 containers per slave right ? But no, I still have 4 containers per slave.
I then tried to increase the mapreduce.map.memory.mb
and mapreduce.reduce.memory.mb
from 512 to 768. This leaves space for 2 containers (2048/768=2).
It doesn't matter if the vCores are set to 1 or 2 for mappers and reducers, this will always produce 2 containers per slave with 768mb and 4 containers with 512mb. So what are vCores for ? The ApplicationMaster doesn't seem to care.
Also, when setting the memory to 768 and vCores to 2, I have this info displayed on nodemanager UI for a mapper container :
The 768 Mb has turned into 1024 TotalMemoryNeeded, and the 2 vCores are ignored and displayed as 1 TotalVCoresNeeded.
So to break down the "how does it work" question into multiple questions :
mapreduce.map.memory.mb
value only a completely abstract value for calculating the number of containers (and that's why it can be rounded up to the next power of 2) ? Or does it represent real memory allocation in some way ?mapreduce.map.java.opts
? Why doesn't yarn use the value from mapreduce.map.memory.mb
to allocate memory to the container ?mapreduce.map.cpu.vcores
in all nodes (master and slaves) but it never changes.I will answer this question, on the assumption that the scheduler used is, CapacityScheduler.
CapacityScheduler uses ResourceCalculator for calculating the resources needed for an application. There are 2 types of resource calculators:
By default, the CapacityScheduler uses DefaultResourceCalculator. If you want to use the DominantResourceCalculator, then you need to set following property in "capacity-scheduler.xml" file:
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
Now, to answer your questions:
If DominantResourceCalculator is used, then both memory and VCores are taken into account for calculating the number of containers
mapreduce.map.memory.mb is not an abstract value. It is taken into consideration while calculating the resources.
The DominantResourceCalculator class has a normalize() function, which normalizes the resource request, using minimumResouce (determined by config yarn.scheduler.minimum-allocation-mb), maximumresource (determined by config yarn.scheduler.maximum-allocation-mb) and a step factor (determined by config yarn.scheduler.minimum-allocation-mb).
The code for normalizing memory looks like below (Check org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.java):
int normalizedMemory = Math.min(roundUp(
Math.max(r.getMemory(), minimumResource.getMemory()),
stepFactor.getMemory()),maximumResource.getMemory());
Where:
r = Requested memory
The logic works like below:
a. Take max of(requested resource and minimum resource) = max(768, 512) = 768
b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately)
Roundup does : ((768 + (512 -1)) / 512) * 512
c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024
So finally, the allotted memory is 1024 MB, which is what you are getting.
For the sake of simplicity, you can say that roundup, increments the demand in the steps of 512 MB (which is a minimumresource)
Where as mapreduce.map.memory.mb is total memory used by the container.
Value of mapreduce.map.java.opts should be lesser than mapreduce.map.memory.mb
The answer here explains that: What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?
When you use DominantResourceCalculator, it uses normalize() function to calculate vCores needed.
The code for that is (similar to normalization of memory):
int normalizedCores = Math.min(roundUp
` Math.max(r.getVirtualCores(), minimumResource.getVirtualCores()),
stepFactor.getVirtualCores()), maximumResource.getVirtualCores());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With