Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS ECS Task Memory Hard and Soft Limits

I'm confused about the purpose of having both hard and soft memory limits for ECS task definitions.

IIRC the soft limit is how much memory the scheduler reserves on an instance for the task to run, and the hard limit is how much memory a container can use before it is murdered.

My issue is that if the ECS scheduler allocates tasks to instances based on the soft limit, you could have a situation where a task that is using memory above the soft limit but below the hard limit could cause the instance to exceed its max memory (assuming all other tasks are using memory slightly below or equal to their soft limit).

Is this correct?

Thanks

like image 983
maambmb Avatar asked Jun 26 '17 16:06

maambmb


People also ask

What is Task size in ECS?

In Amazon ECS, there are two resource metrics used for capacity: CPU and memory. CPU is measured in units of 1/1024 of a full vCPU (where 1024 units is equal to 1 whole vCPU). Memory is measured in megabytes. In your task definition, you can declare resource reservations and limits.

What is hard limit in AWS?

IIRC the soft limit is how much memory the scheduler reserves on an instance for the task to run, and the hard limit is how much memory a container can use before it is murdered.

What is the maximum amount of RAM a container can consume if the memory flag is not used in AWS?

Limiting Memory This sets a hard limit. That means that under no circumstances will the container be allowed to use more than 256 MB of RAM.

How many containers can be defined in a task definition?

The task definition is a text file, in JSON format, that describes one or more containers, up to a maximum of ten, that form your application.


2 Answers

If you expect to run a compute workload that is primarily memory bound instead of CPU bound then you should use only the hard limit, not the soft limit. From the docs:

You must specify a non-zero integer for one or both of memory or memoryReservation in container definitions. If you specify both, memory must be greater than memoryReservation. If you specify memoryReservation, then that value is subtracted from the available memory resources for the container instance on which the container is placed; otherwise, the value of memory is used.

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html

By specifying only a hard memory limit for your tasks you avoid running out of memory because ECS stops placing tasks on the instance, and docker kills any containers that try to go over the hard limit.

The soft memory limit feature is designed for CPU bound applications where you want to reserve a small minimum of memory (the soft limit) but allow occasional bursts up to the hard limit. In this type of CPU heavy workload you don't really care about the specific value of memory usage for the containers that much because the containers will run out of CPU long before they exhaust the memory of the instance, so you can place tasks based on CPU reservation and the soft memory limit. In this setup the hard limit is just a failsafe in case something goes out of control or there is a memory leak.

So in summary you should evaluate your workload using load tests and see whether it tends to run out of CPU first or out of memory first. If you are CPU bound then you can use the soft memory limit with an optional hard limit just as a failsafe. If you are memory bound then you will need to use just the hard limit with no soft limit.

like image 65
nathanpeck Avatar answered Oct 10 '22 10:10

nathanpeck


@nathanpeck is the authority here, but I just wanted to address a specific scenario that you brought up:

My issue is that if the ECS scheduler allocates tasks to instances based on the soft limit, you could have a situation where a task that is using memory above the soft limit but below the hard limit could cause the instance to exceed its max memory (assuming all other tasks are using memory slightly below or equal to their soft limit).

This post from AWS explains what occurs in such a scenario:

If containers try to consume memory between these two values (or between the soft limit and the host capacity if a hard limit is not set), they may compete with each other. In this case, what happens depends on the heuristics used by the Linux kernel’s OOM (Out of Memory) killer. ECS and Docker are both uninvolved here; it’s the Linux kernel reacting to memory pressure. If something is above its soft limit, it’s more likely to be killed than something below its soft limit, but figuring out which process gets killed requires knowing all the other processes on the system and what they are doing with their memory as well. Again the new memory feature we announced can come to rescue here. While the OOM behavior isn’t changing, now containers can be configured to swap out to disk in a memory pressure scenario. This can potentially alleviate the need for the OOM killer to kick in (if containers are configured to swap).

like image 1
pavelv Avatar answered Oct 10 '22 09:10

pavelv