I have many services. In a day, a few services are busy for about ten hours, while most other services are idle or use a small amount of cpu.
In the past, I put all services in a virtual machine with two cpus, and scale by cpu usage, there are two virtual machine at the busiest time, but most of the time there is only one.
| services | instances | busy time in a day | cpu when busy (core/service) | cpu when idle (core/service) | 
|---|---|---|---|---|
| busy services | 2 | 8~12 hours | 0.5~1 | 0.1~0.5 | 
| busy services | 2 | 8~12 hours | 0.3~0.8 | 0.1~0.3 | 
| inactive services | 30 | 0~1 hours | 0.1~0.3 | < 0.1 | 
Now, I want to put them in kubernetes, each node has two CPUs, and use node autoscaling and HPA, in order to make the node autoscaling, I must set requests CPU for all services, which is exactly the difficulty I encountered.
This is my setting.
| services | instances | busy time | requests cpu (cpu/service) | total requests cpu | 
|---|---|---|---|---|
| busy services | 2 | 8~12 hours | 300m | 600m | 
| busy services | 2 | 8~12 hours | 300m | 600m | 
| inactive services | 30 | 0~1 hours | 100m | 3000m | 
Note: The inactive service requests CPU is set to 100m because it will not work well if it is less than 100m when it is busy.
With this setting, the number of nodes will always be greater than three, which is too costly. I think the problem is that although these services require 100m of CPU to work properly, they are mostly idle.
I really hope that all services can autoscaling, I think this is the benefit of kubernetes, which can help me assign pods more flexibly. Is my idea wrong? Shouldn't I set a request CPU for an inactive service?
Even if I ignore inactive services. I find that kubernetes more often has more than two nodes. If I have more active services, even in off-peak hours, the requests CPU will exceed 2000m. Is there any solution?
Total CPU limit of a cluster is the total amount of cores used by all nodes present in cluster. If you have a 2 node cluster and the first node has 2 cores and second node has 1 core, K8s CPU capacity will be 3 cores (2 core + 1 core).
Get Node CPU usage and memory usage of each node – Kubectl The Simple resource-capacity command with kubectl would return the CPU requests and limits and memory requests and limits of each Node available in the cluster. You can use the --sort cpu. limit flag to sort by the CPU limit.
CPU limits on Kubernetes are an antipattern Many people think you need CPU limits on Kubernetes but this isn't true. In most cases, Kubernetes CPU limits do more harm than help.
If your app starts hitting your CPU limits, Kubernetes starts throttling your container. This means the CPU will be artificially restricted, giving your app potentially worse performance! However, it won't be terminated or evicted. You can use a liveness health check to make sure performance has not been impacted.
I put all services in a virtual machine with two cpus, and scale by cpu usage, there are two virtual machine at the busiest time, but most of the time there is only one.
First, if you have any availability requirements, I would recommend to always have at least two nodes. If you have only one node and that one crash (e.g. hardware failure or kernel panic) it will take some minutes before this is detected and it will take some minutes before a new node is up.
The inactive service requests cpu is set to 100m because it will not work well if it is less than 100m when it is busy.
I think the problem is that although these services require 100m of cpu to work properly, they are mostly idle.
The CPU request is a guaranteed reserved resource amount. Here you reserve too much resources for your almost idling services. Set the CPU request lower, maybe as low as 20m or even 5m? But since these services will need more resources during busy periods, set a higher limit so that the container can "burst" and also use Horizontal Pod Autoscaler for these. When using the Horizontal Pod Autoscaler more replicas will be created and the traffic will be load balanced across all replicas. Also see Managing Resources for Containers.
This is also true for your "busy services", reserve less CPU resources and use Horizontal Pod Autoscaling more actively so that the traffic is spread to more nodes during high load, but can scale down and save cost when the traffic is low.
I really hope that all services can autoscaling, I think this is the benefit of kubernetes, which can help me assign pods more flexibly. Is my idea wrong?
Yes, I agree with you.
Shouldn't I set a request cpu for an inactive service?
It is a good practice to always set some value for request and limit, at least for a production environment. The scheduling and autoscaling will not work well without resource requests.
If I have more active services, even in off-peak hours, the requests cpu will exceed 2000m. Is there any solution?
In general, try to use lower resource requests and use Horizontal Pod Autoscaling more actively. This is true for both your "busy services" and your "inactive services".
I find that kubernetes more often has more than two nodes.
Yes, there are two aspects of this.
If you only use two nodes, your environment probably is small and the Kubernetes control plane probably consists of more nodes and is the majority of the cost. For very small environments, Kubernetes may be expensive and it would be more attractive to use e.g. a serverless alternative like Google Cloud Run
Second, for availability. It is good to have at least two nodes in case of an abrupt crash e.g. hardware failure or a kernel panic, so that your "service" is still available meanwhile the node autoscaler scales up a new node. This is also true for the number of replicas for a Deployment, if availability is important, use at least two replicas. When you e.g. drain a node for maintenance or node upgrade, the pods will be evicted - but not created on a different node first. The control plane will detect that the Deployment (technically ReplicaSet) has less than the desired number of replicas and create a new pod. But when a new Pod is created on a new node, the container image will first be pulled before the Pod is running. To avoid downtime during these events, use at least two replicas for your Deployment and Pod Topology Spread Constraints to make sure that those two replicas run on different nodes.
Note: You might run into the same problem as How to use K8S HPA and autoscaler when Pods normally need low CPU but periodically scale and that should be mitigated by an upcoming Kubernetes feature: KEP - Trimaran: Real Load Aware Scheduling
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With