AWS Sagemaker inference endpoint doesn't scale in with autoscaling

Question

I have an AWS Sagemaker inference endpoint with autoscaling enabled with SageMakerVariantInvocationsPerInstance target metric. When I send a lot of requests to the endpoint the number of instances correctly scales out to the maximum instance count. But after I stop sending the requests the number of instances doesn't scale in to 1, minimum instance count. I waited for many hours. Is there a reason for this behaviour?

Thanks

Shahad · Accepted Answer

AutoScaling requires a cloudwatch alarm to trigger to scale in. Sagemaker doesn't push 0 value metrics when there's no activity (it just doesn't push anything). This leads to the alarm being put into insufficient data and not triggering the autoscaling scale in action when your workload suddenly ends.

Workarounds are either:

Have a step scaling policy using the cloudwatch metric math FILL() function for your scale in. This way you can tell CloudWatch "if there's no data, pretend this was the metric value when evaluating the alarm. This is only possible with step scaling since target tracking creates the alarms for you (and AutoScaling will periodically recreate them, so if you make manual changes they'll get deleted)
Have scheduled scaling set the size back down to 1 every evening
Make sure traffic continues at a low level for some times

AWS Sagemaker inference endpoint doesn't scale in with autoscaling

Tags:

autoscaling

amazon-sagemaker

aws-auto-scaling

hovnatan

1 Answers

Shahad

Recent Activity

Donate For Us

AWS Sagemaker inference endpoint doesn't scale in with autoscaling

Tags:

autoscaling

amazon-sagemaker

aws-auto-scaling

hovnatan

1 Answers

Shahad

Related questions

Recent Activity

Donate For Us