Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HPA scale down not happening properly

I have created HPA for my deployment, it’s working fine for scaling up to max replicas (6 in my case), when load reduces its scale down to 5 but it supposed to come to my original state of replicas (1 in my case) as load becomes normal . I have verified after 30-40 mins still my application have 5 replicas .. It supposed to be 1 replica.

[ec2-user@ip-192-168-x-x ~]$ kubectl describe hpa admin-dev -n dev

Name: admin-dev
Namespace: dev
Labels: <none>
Annotations: <none>
CreationTimestamp: Thu, 24 Oct 2019 07:36:32 +0000
Reference: Deployment/admin-dev
Metrics: ( current / target )
resource memory on pods (as a percentage of request): 49% (1285662037333m) / 60%
Min replicas: 1
Max replicas: 10
Deployment pods: 3 current / 3 desired
Conditions:
  Type           Status Reason             Message
  ----           ------ ------             -------
  AbleToScale    True   ReadyForNewScale   recommended size matches current size
  ScalingActive  True   ValidMetricFound   the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited False  DesiredWithinRange the desired count is within the acceptable range 

Events:
  Type   Reason            Age   From                      Message
  ----   ------            ----  ----                      -------
  Normal SuccessfulRescale 13m   horizontal-pod-autoscaler New size: 2; reason: memory resource utilization (percentage of request) above target
  Normal SuccessfulRescale 5m27s horizontal-pod-autoscaler New size: 3; reason: memory resource utilization (percentage of request) above target
like image 764
Lakshmi Reddy Avatar asked Jan 01 '23 13:01

Lakshmi Reddy


1 Answers

When the load decreases, the HPA intentionally waits a certain amount of time before scaling the app down. This is known as the cooldown delay and helps that the app is scaled up and down too frequently. The result of this is that for a certain time the app runs at the previous high replica count even though the metric value is way below the target. This may look like the HPA doesn't respond to the decreased load, but it eventually will.

However, the default duration of the cooldown delay is 5 minutes. So, if after 30-40 minutes the app still hasn't been scaled down, it's strange. Unless the cooldown delay has been set to something else with the --horizontal-pod-autoscaler-downscale-stabilization flag of the controller manager.

In the output that you posted the metric value is 49% with a target of 60% and the current replica count is 3. This seems actually not too bad.

An issue might be that you're using the memory utilisation as a metric, which is not a good autoscaling metric.

An autoscaling metric should linearly respond to the current load across the replicas of the app. If the number of replicas is doubled, the metric value should halve, and if the number of replicas is halved, the metric value should double. The memory utilisation in most cases doesn't show this behaviour. For example, if each replica uses a fixed amount of memory, then the average memory utilisation across the replicas stays roughly the same regardless of how many replicas were added or removed. The CPU utilisation generally works much better in this regard.

like image 62
weibeld Avatar answered Jan 04 '23 01:01

weibeld