Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to increase resolution of CpuUtilization metric of ECS cluster past 1 min mark?

I'm trying to create a robust autoscaling process for my ECS cluster but am facing problems with resolution of CpuUtilization metric. I have turned on 'Detailed metrics' for 1-min resolution, but am not able to achieve good scaling results. I am deploying an ML model which takes roughly 1.5s to infer. I am not facing any memory bottleneck and hence, am using CpuUtilization for scaling.

I need fast scaling as when requests start piling up the response time easily shoots up to 3-5s. Currently, with 'Detailed Metrics' enabled. The scale-out time takes around 3-5 miuntes to start as 3 datapoints are checked for 1-min res metrics. If I have 5-10s res metric, then I can look at 6 data points within 30s and start the scale-out job faster.

I tried using Lambda, StepFunctions and EventBridge from this blog. But, I am not able to get CpuUtilization or MemoryUtilization, only the task, service and container counts.

Is there a way to get Cpu and Memory metrics directly from ECS? I know we can use cloudwatch.get_metric_statistics(). But, we can only get datapoints that are reported to CloudWatch. So, not useful.

like image 596
Priyam Mehta Avatar asked Sep 18 '25 02:09

Priyam Mehta


1 Answers

You can't change that. 1 min value is set by AWS. The only thing you can do to get better resolution is to create your own custom metrics. Custom metrics can have resolution of 1 second.

like image 143
Marcin Avatar answered Sep 19 '25 18:09

Marcin