Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limit number of simultaneously running containers per application in yarn

Suppose a yarn application has long-running tasks (running for 1 hour or longer). When a MR job starts, all cluster resources are blocked, at least until one container is finished, which sometimes can take a long time.

Is there a way to limit the number of simultaneously running containers? Something along the lines, e.g. map.vcores.max (per NM, or globally). So the other applications are not blocked.

Any ideas?

ps. Hadoop 2.3.0

like image 841
Ivan Balashov Avatar asked Jul 09 '14 16:07

Ivan Balashov


People also ask

How does YARN allocate containers?

YARN uses the MB of memory and virtual cores per node to allocate and track resource usage. For example, a 5 node cluster with 12 GB of memory allocated per node for YARN has a total memory capacity of 60GB. For a default 2GB container size, YARN has room to allocate 30 containers of 2GB each.

What is running containers in yarn?

Yarn container are a process space where a given task in isolation using resources from resources pool. It's the authority of the resource manager to assign any container to applications. The assign container has a unique customerID and is always on a single node.

What is container in yarn Hadoop?

Container represents an allocated resource in the cluster. The ResourceManager is the sole authority to allocate any Container to applications. The allocated Container is always on a single node and has a unique ContainerId . It has a specific amount of Resource allocated.


2 Answers

This behaviour/feature can be handled per framework level rather than in YARN.

In Mapreduce, mapreduce.job.running.map.limit and mapreduce.job.running.reduce.limit can be used to limit the simultaneously running containers.

In Tez, It can handled using the property tez.am.vertex.max-task-concurrency

Related Jira -
https://issues.apache.org/jira/browse/MAPREDUCE-5583
https://issues.apache.org/jira/browse/TEZ-2914

like image 183
SachinJ Avatar answered Sep 18 '22 19:09

SachinJ


As far as I can see you cannot directly limit number of containers. This is only determined by resources. So the best you can do is to limit resources per application.

In accordance to Fair scheduler documentation you can assign your application to special queue. In this case you can receive configuration which is pretty close to your task - as you can limit memory or cores resource per queue.

Maybe you can switch to different scheduler or even implement custom one but I don't like this way as doing this you step out of well-tested environment and I don't think you really need to do so much work like custom implementation.

like image 35
Roman Nikitchenko Avatar answered Sep 22 '22 19:09

Roman Nikitchenko