Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are containers created based on vcores and memory in MapReduce2?

I have a tiny cluster composed of 1 master (namenode, secondarynamenode, resourcemanager) and 2 slaves (datanode, nodemanager).

I have set in the yarn-site.xml of the master :

  • yarn.scheduler.minimum-allocation-mb : 512
  • yarn.scheduler.maximum-allocation-mb : 1024
  • yarn.scheduler.minimum-allocation-vcores : 1
  • yarn.scheduler.maximum-allocation-vcores : 2

I have set in the yarn-site.xml of the slaves :

  • yarn.nodemanager.resource.memory-mb : 2048
  • yarn.nodemanager.resource.cpu-vcores : 4

Then in the master, I have set in mapred-site.xml :

  • mapreduce.map.memory.mb : 512
  • mapreduce.map.java.opts : -Xmx500m
  • mapreduce.map.cpu.vcores : 1
  • mapreduce.reduce.memory.mb : 512
  • mapreduce.reduce.java.opts : -Xmx500m
  • mapreduce.reduce.cpu.vcores : 1

So it is my understanding that when running a job, the mapreduce ApplicationMaster will try to create as many containers of 512 Mb and 1 vCore on both slaves, which have only 2048 Mb and 4 vCores available each, which gives space for 4 containers on each slave. This is precisely what is happening on my jobs, so no problem so far.

However, when i increment the mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores from 1 to 2, there should theoretically be only enough vCores available for creating 2 containers per slave right ? But no, I still have 4 containers per slave.

I then tried to increase the mapreduce.map.memory.mb and mapreduce.reduce.memory.mb from 512 to 768. This leaves space for 2 containers (2048/768=2).

It doesn't matter if the vCores are set to 1 or 2 for mappers and reducers, this will always produce 2 containers per slave with 768mb and 4 containers with 512mb. So what are vCores for ? The ApplicationMaster doesn't seem to care.

Also, when setting the memory to 768 and vCores to 2, I have this info displayed on nodemanager UI for a mapper container :

nodemanager UI screenshot

The 768 Mb has turned into 1024 TotalMemoryNeeded, and the 2 vCores are ignored and displayed as 1 TotalVCoresNeeded.

So to break down the "how does it work" question into multiple questions :

  1. Is only memory used (and vCores ignored) to calculate the number of containers ?
  2. Is the mapreduce.map.memory.mb value only a completely abstract value for calculating the number of containers (and that's why it can be rounded up to the next power of 2) ? Or does it represent real memory allocation in some way ?
  3. Why do we specify some -Xmx value in mapreduce.map.java.opts ? Why doesn't yarn use the value from mapreduce.map.memory.mb to allocate memory to the container ?
  4. What is TotalVCoresNeeded and why is it always equal to 1 ? I tried to change mapreduce.map.cpu.vcores in all nodes (master and slaves) but it never changes.
like image 814
Nicomak Avatar asked Oct 13 '15 09:10

Nicomak


1 Answers

I will answer this question, on the assumption that the scheduler used is, CapacityScheduler.

CapacityScheduler uses ResourceCalculator for calculating the resources needed for an application. There are 2 types of resource calculators:

  1. DefaultResourceCalculator: Takes into account, only memory for doing the resource calculations (i.e. for calculating number of containers)
  2. DominantResourceCalculator: Takes into account, both memory and CPU for resource calculations

By default, the CapacityScheduler uses DefaultResourceCalculator. If you want to use the DominantResourceCalculator, then you need to set following property in "capacity-scheduler.xml" file:

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

Now, to answer your questions:

  1. If DominantResourceCalculator is used, then both memory and VCores are taken into account for calculating the number of containers

  2. mapreduce.map.memory.mb is not an abstract value. It is taken into consideration while calculating the resources.

    The DominantResourceCalculator class has a normalize() function, which normalizes the resource request, using minimumResouce (determined by config yarn.scheduler.minimum-allocation-mb), maximumresource (determined by config yarn.scheduler.maximum-allocation-mb) and a step factor (determined by config yarn.scheduler.minimum-allocation-mb).

    The code for normalizing memory looks like below (Check org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.java):

    int normalizedMemory = Math.min(roundUp(
    Math.max(r.getMemory(), minimumResource.getMemory()),
    stepFactor.getMemory()),maximumResource.getMemory());
    

Where:

r = Requested memory

The logic works like below:

a. Take max of(requested resource and minimum resource) = max(768, 512) = 768

b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately)

Roundup does : ((768 + (512 -1)) / 512) * 512 

c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024

So finally, the allotted memory is 1024 MB, which is what you are getting.

For the sake of simplicity, you can say that roundup, increments the demand in the steps of 512 MB (which is a minimumresource)

  1. Since Mapper is a java process, mapreduce.map.java.opts is used for specifying the heap size for the mapper.

Where as mapreduce.map.memory.mb is total memory used by the container.

Value of mapreduce.map.java.opts should be lesser than mapreduce.map.memory.mb

The answer here explains that: What is the relation between 'mapreduce.map.memory.mb' and 'mapred.map.child.java.opts' in Apache Hadoop YARN?

  1. When you use DominantResourceCalculator, it uses normalize() function to calculate vCores needed.

    The code for that is (similar to normalization of memory):

      int normalizedCores = Math.min(roundUp  
    `   Math.max(r.getVirtualCores(), minimumResource.getVirtualCores()), 
        stepFactor.getVirtualCores()), maximumResource.getVirtualCores());
    
like image 117
Manjunath Ballur Avatar answered Oct 19 '22 03:10

Manjunath Ballur