Say we have the following deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
  ...
spec:
  replicas: 2
  template:
    spec:
      containers:
        - image: ...
          ...
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
            limits:
              cpu: 500m
              memory: 300Mi
And we also create a HorizontalPodAutoscaler object which automatically scales up/down the number of pods based on CPU average utilization. I know that the HPA will compute the number of pods based on the resource requests, but what if I want the containers to be able to request more resources before scaling horizontally?
I have two questions:
1) Are resource limits even used by K8s when a HPA is defined?
2) Can I tell the HPA to scale based on resource limits rather than requests? Or as a means of implementing such a control, can I set the targetUtilization value to be more than 100%?
The Horizontal Pod Autoscaler can automatically scale the number of Pods in your workload based on one or more metrics of the following types: Actual resource usage: when a given Pod's CPU or memory usage exceeds a threshold.
As currently, HPA uses resources. requests as its base to calculate and compare the resource utilization, setting a target above 100% should not cause any problem as long as the threshold(tragetUtilization) is less than or equal to resources. limits . For example, deploy an application with resources.
In addition to supporting horizontal scaling to add more pods, Kubernetes also allows vertical scaling that involves the dynamic provisioning of attributed resources, such as RAM or CPU of cluster nodes to match changing application requirements.
When enabled, the cluster autoscaler algorithm checks for pending pods. The cluster autoscaler requests a newly provisioned node if: 1) there are pending pods due to not having enough available cluster resources to meet their requests and 2) the cluster or node pool has not reached the user-defined maximum node count.
No, HPA is not looking at limits at all. You can specify target utilization to any value even higher than 100%.
Hi in deployment we have resources requests and limits. As per documentation here those parameters acts before HPA gets main role as autoscaler:
- When you create a Pod, the Kubernetes scheduler selects a node for the Pod to run on. Each node has a maximum capacity for each of the resource types: the amount of CPU and memory it can provide for Pods.
- Then the kubelet starts a Container of a Pod, it passes the CPU and memory limits to the container runtime.
- If a Container exceeds its memory limit, it might be terminated. If it is restartable, the kubelet will restart it, as with any other type of runtime failure.
If a Container exceeds its memory request, it is likely that its Pod will be evicted whenever the node runs out of memory.
On the other hand:
The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager’s (with default value of 15 seconds). The controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition.
Note: Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.
Hope this help
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With