Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Yarn container understanding and tuning

Hi we have recently upgraded to yarn from mr1. I know that container is an abstract notion but I don't understand how many jvm task (map, reduce, filter etc) one container can spawn or other way to ask is is container reusable across mutltiple map or reduce tasks. I read in following blog : What is a container in YARN?

"each mapper and reducer runs on its own container to be accurate!" which means if I look at AM logs I should see number of container allocated equal to number of map tasks (failed|success) plus number of reduce task is that correct?

I know number of containers changes during Application life cycle, based on AM requests, splits, scheduler etc.

But is there a way to request initial number of minimum container for given application. I think one way is to configure fair-scheduler queue. But is there anything else that can dictate this?

In case of MR if I have mapreduce.map.memory.mb = 3gb and mapreduce.map.cpu.vcores=4. I also have yarn.scheduler.minimum-allocation-mb = 1024m and yarn.scheduler.minimum-allocation-vcores = 1.

Does that mean I will get one container with 4 cores or 4 containers with one core?

Also its not clear where can you specify mapreduce.map.memory.mb and mapreduce.map.cpu.vcores. Should they be set in client node or can they be set per application as well?

Also from RM UI or AM UI is there a way to see currently assigned containers for given application?

like image 439
nir Avatar asked Oct 08 '15 00:10

nir


People also ask

What is yarn container?

Yarn container are a process space where a given task in isolation using resources from resources pool. It's the authority of the resource manager to assign any container to applications. The assign container has a unique customerID and is always on a single node.

How many containers does yarn allocate to a MapReduce application?

MapReduce requests three different kinds of containers from YARN: the application master container, map containers, and reduce containers. For each container type, there is a corresponding set of properties that can be used to set the resources requested.

What is yarn NodeManager resource memory MB?

yarn.nodemanager.resource.memory-mb. Amount of physical memory per NodeManager, in MB, that can be allocated for containers. yarn.scheduler.minimum-allocation-mb. The minimum allocation for every container request at the ResourceManager, in MB. Memory requests lower than the specified value will not take effect.


1 Answers

  1. Container is a logical entity. It grants an application to use specific amount of resources (memory, CPU etc.) on a specific host (Node Manager). A container can not be re-used across map and reduce tasks for the same application.

For e.g. I have a Mapreduce application, which spawns 10 mappers: Number of mappers

I am running this on a single host with 8 vCores (this value is determined by the configuration parameter: yarn.nodemanager.resource.cpu-vcores). By default, this is set to 8. Please check "YarnConfiguration.java"

  /** Number of Virtual CPU Cores which can be allocated for containers.*/
  public static final String NM_VCORES = NM_PREFIX + "resource.cpu-vcores";
  public static final int DEFAULT_NM_VCORES = 8;

Since there are 10 mappers and 1 Application master, total number of containers spawned is 11. enter image description here

So, for each map/reduce task a different container gets launched.

But, in Yarn, for MapReduce jobs, there is a concept of a Uber job, which enables the user to use a single container for multiple mappers and 1 reducer (https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml: CURRENTLY THE CODE CANNOT SUPPORT MORE THAN ONE REDUCE and will ignore larger values.).

  1. There is no configuration parameter available to specify the minimum number of the containers. It is the responsibility of the Application Master to request the number of containers needed.

  2. yarn.scheduler.minimum-allocation-mb - Determines the minimum allocation of memory for each container (yarn.scheduler.maximum-allocation-mb determines the maximum allocation for every container request)

    yarn.scheduler.minimum-allocation-vcores - Determines the minumum allocation of vCores for each container (yarn.scheduler.maximum-allocation-vcores determines the maximum allocation for every container request)

    In your case, you are requesting "mapreduce.map.memory.mb = 3m (3MB) and mapreduce.map.cpu.vcores = 4 (4 vCores).

    So, you will get 1 container with 4 vCores for each mapper (assuming yarn.scheduler.maximum-allocation-vcores is >= 4)

  3. The parameters "mapreduce.map.memory.mb" and "mapreduce.map.cpu.vcores" are set in the mapred-site.xml file. If this configuration parameter is not "final", then it can be overridden in the client, before submitting the job.

  4. Yes. From the "Application Attempt" page for the application, you can see the number of allocated containers. Check the attached figure above.

like image 102
Manjunath Ballur Avatar answered Sep 28 '22 03:09

Manjunath Ballur