Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does pod replica scaling down work in Kubernetes Horizontal Pod Autoscaler?

My understanding is that in Kubernetes, when using the Horizontal Pod Autoscaler, if the targetCPUUtilizationPercentage field is set to 50%, and the average CPU utilization across all the pod's replicas is above that value, the HPA will create more replicas. Once the average CPU drops below 50% for some time, it will lower the number of replicas.

Here is the part that I am not sure about:
What if the CPU utilization on a pod is 10%, not 0%?Will HPA still terminate the replica?
10% CPU isn't much, but since it's not 0%, some task is currently running on that pod. If it's a long lasting task (several seconds) and HPA decides to terminate the pod, that task will not be finished.

Does the HPA terminate pods only if the CPU utilization on them is 0% or does it terminate them whenever it sees that the value is below targetCPUUtilizationPercentage?

How does HPA decide which pods to remove?
Thank you!

like image 243
pkout Avatar asked Apr 28 '18 19:04

pkout


People also ask

How does horizontal scaling work in Kubernetes?

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand. Horizontal scaling means that the response to increased load is to deploy more Pods.

What does the Kubernetes horizontal pod Autoscaler do?

The Horizontal Pod Autoscaler changes the shape of your Kubernetes workload by automatically increasing or decreasing the number of Pods in response to the workload's CPU or memory consumption, or in response to custom metrics reported from within Kubernetes or external metrics from sources outside of your cluster.

How cluster Autoscaler scale down?

The scaleDownUtilizationThreshold defines the proportion between requested resources and capacity, which under the value cluster-autoscaler will trigger the scaling down action. Our default value is 65%, which means in order to scale down, one of the nodes has to have less utilization (CPU/memory) than this threshold.

Is Kubernetes a vertical or horizontal scaling?

This is were Kubernetes Autoscaling comes in: Kubernetes provides multiple layers of autoscaling functionality: Pod-based scaling with the Horizontal Pod Autoscaler and the Vertical Pod Autoscaler, as well as node-based with the Cluster Autoscaler.


1 Answers

So you have two questions in there and let me address one by one. The first part - if a pod in a replica set is consuming let's say 10% then will Kubernetes kill that pod? The answer is Yes. Kubernetes is not looking at the individual pods but at an average of that metric across all pods in that replica set. Also the scaling down is gradual as explained here

The second part of the question - how does your application behave gracefully when a pod is about to be killed and it is still serving some requests? This can be handled by the grace period of the pod termination and even better if you implement a PreStop hook - which will allow you to do something like stop taking incoming requests but process existing requests. The implementation of this will vary based on the language runtime you are using, so I won't go in the details here.

Lastly - one scenario you should consider is what if VM on which pod was running goes down abruptly - you have no chance to execute PreStop hook! I think the application needs to be robust enough to handle failures.

like image 160
Vishal Biyani Avatar answered Nov 13 '22 19:11

Vishal Biyani