Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWS Sagemaker inference endpoint doesn't scale in with autoscaling

I have an AWS Sagemaker inference endpoint with autoscaling enabled with SageMakerVariantInvocationsPerInstance target metric. When I send a lot of requests to the endpoint the number of instances correctly scales out to the maximum instance count. But after I stop sending the requests the number of instances doesn't scale in to 1, minimum instance count. I waited for many hours. Is there a reason for this behaviour?

Thanks

like image 242
hovnatan Avatar asked Sep 19 '25 02:09

hovnatan


1 Answers

AutoScaling requires a cloudwatch alarm to trigger to scale in. Sagemaker doesn't push 0 value metrics when there's no activity (it just doesn't push anything). This leads to the alarm being put into insufficient data and not triggering the autoscaling scale in action when your workload suddenly ends.

Workarounds are either:

  1. Have a step scaling policy using the cloudwatch metric math FILL() function for your scale in. This way you can tell CloudWatch "if there's no data, pretend this was the metric value when evaluating the alarm. This is only possible with step scaling since target tracking creates the alarms for you (and AutoScaling will periodically recreate them, so if you make manual changes they'll get deleted)
  2. Have scheduled scaling set the size back down to 1 every evening
  3. Make sure traffic continues at a low level for some times
like image 125
Shahad Avatar answered Sep 23 '25 13:09

Shahad