I want to understand how HPA computes CPU utilization across Pods.
According to this doc it takes the average of CPU utilization of a pod (average across the last 1 minute) divided by the CPU requested by the pod. Then it computes the arithmetic mean of all the pods' CPU.
Unfortunately the doc contains some information that are outdated like for example that --horizontal-pod-autoscaler-sync-period
is by default set to 30 seconds but in the official doc, the default value is 15 seconds.
When I tested, I noticed that HPA scales up even before that average CPU reaches the threshold I set (which is 90%), Which made me think that maybe it takes the maximum CPU across Pods and not the average.
My question is where I can find an updated documentation to understand exactly how HPA works?
Note that I've not a Kubernetes cluster at hand, this is a theoretical answer based on the source code of k8s.
See if this actually matches your experience.
Kubernetes is opensource, here seems to be the HPA code.
The functions GetResourceReplica
and calcPlainMetricReplicas
(for non-utilization percentage) compute the number of replicas given the current metrics.
Both use the usageRatio
returned by GetMetricUtilizationRatio
, this value is multiplied by the number of currently ready pods in the Replica to get the new number of pods:
New_number_of_pods = Old_numbers_of_ready_pods * usageRatio
There is a tolerance check (ie if the usageRatio
falls close enough to 1, nothing is done) and the pending and unkown-state pods are ignored (considered to use 0% of the resource) while the pods without metrics are considered to use 100% of the resource.
The usageRatio
is computed by GetResourceUtilizationRatio
that is passed the metrics and the requests (of resources) of all the pods, it goes as follow:
utilization = Total_sum_resource_usage_all_pods / Total_sum_resource_requests_all_pods
usageRatio = utilization * 100 / targetUtilization
Where targetUtilization
comes from the HPA spec.
The code is easier to read than this summary of mine, in this context the term request means "resource request" (that's an educated guess).
So I'd say that 90% is the resource usage across all pods computed as they were all a single pod requesting the sum of each pod's request and collecting the metrics as they were all running on a single dedicated node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With