Suppose a yarn application has long-running tasks (running for 1 hour or longer). When a MR job starts, all cluster resources are blocked, at least until one container is finished, which sometimes can take a long time.
Is there a way to limit the number of simultaneously running containers? Something along the lines, e.g. map.vcores.max (per NM, or globally). So the other applications are not blocked.
Any ideas?
ps. Hadoop 2.3.0
YARN uses the MB of memory and virtual cores per node to allocate and track resource usage. For example, a 5 node cluster with 12 GB of memory allocated per node for YARN has a total memory capacity of 60GB. For a default 2GB container size, YARN has room to allocate 30 containers of 2GB each.
Yarn container are a process space where a given task in isolation using resources from resources pool. It's the authority of the resource manager to assign any container to applications. The assign container has a unique customerID and is always on a single node.
Container represents an allocated resource in the cluster. The ResourceManager is the sole authority to allocate any Container to applications. The allocated Container is always on a single node and has a unique ContainerId . It has a specific amount of Resource allocated.
This behaviour/feature can be handled per framework level rather than in YARN.
In Mapreduce, mapreduce.job.running.map.limit
and mapreduce.job.running.reduce.limit
can be used to limit the simultaneously running containers.
In Tez, It can handled using the property tez.am.vertex.max-task-concurrency
Related Jira -
https://issues.apache.org/jira/browse/MAPREDUCE-5583
https://issues.apache.org/jira/browse/TEZ-2914
As far as I can see you cannot directly limit number of containers. This is only determined by resources. So the best you can do is to limit resources per application.
In accordance to Fair scheduler documentation you can assign your application to special queue. In this case you can receive configuration which is pretty close to your task - as you can limit memory or cores resource per queue.
Maybe you can switch to different scheduler or even implement custom one but I don't like this way as doing this you step out of well-tested environment and I don't think you really need to do so much work like custom implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With