I have a GKE cluster (n1-standard-1, master version 1.13.6-gke.13) with 3 nodes on which I have 7 deployments, each running a Spring Boot application. A default Horizontal Pod Autoscaler was created for each deployment, with target CPU 80% and min 1 / max 5 replicas.
During normal operation, there is typically 1 pod per deployment and CPU usage at 1-5%. But when the application starts, e.g after performing a rolling update, the CPU usage spikes and the HPA scales up to max number of replicas reporting CPU usage at 500% or more.
When multiple deployments are started at the same time, e.g after a cluster upgrade, it often causes various pods to be unschedulable because it's out of CPU, and some pods are at "Preemting" state.
I have changed the HPAs to max 2 replicas since currently that's enough. But I will be adding more deployments in the future and it would be nice to know how to handle this correctly. I'm quite new to Kubernetes and GCP so I'm not sure how to approach this.
Here is the CPU chart for one of the containers after a cluster upgrade earlier today:
Everything runs in the default namespace and I haven't touched the default LimitRange with 100m default CPU request. Should I modify this and set limits? Given that the initialization is resource demanding, what would the proper limits be? Or do I need to upgrade the machine type with more CPU?
HPA only takes into account ready pods. Since your pods only experience a spike in CPU usage during the early stages, your best bet is to configure a readiness probe that only shows as ready once the CPU usage comes down or has a initialDelaySeconds set longer than the startup period to ensure the spike in CPU usage is not taken into account for the HPA.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With