Cluster autoscaler not downscaling

Tags:

I have a regional cluster set up in google kubernetes engine (GKE). The node group is a single vm in each region (3 total). I have a deployment with 3 replicas minimum controlled by a HPA. The nodegroup is configured to be autoscaling (cluster autoscaling aka CA). The problem scenario:

Update deployment image. Kubernetes automatically creates new pods and the CA identifies that a new node is needed. I now have 4. The old pods get removed when all new pods have started, which means I have the exact same CPU request as the minute before. But the after the 10 min maximum downscale time I still have 4 nodes.

The CPU requests for the nodes is now:

CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  358m (38%)    138m (14%)  516896Ki (19%)   609056Ki (22%)
--
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  800m (85%)    0 (0%)      200Mi (7%)       300Mi (11%)
--
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  510m (54%)    100m (10%)  410Mi (15%)      770Mi (29%)
--
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  823m (87%)    158m (16%)  484Mi (18%)      894Mi (33%)

The 38% node is running:

Namespace                  Name                                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                                            ------------  ----------  ---------------  -------------
  kube-system                event-exporter-v0.1.9-5c8fb98cdb-8v48h                          0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                fluentd-gcp-v2.0.17-q29t2                                       100m (10%)    0 (0%)      200Mi (7%)       300Mi (11%)
  kube-system                heapster-v1.5.2-585f569d7f-886xx                                138m (14%)    138m (14%)  301856Ki (11%)   301856Ki (11%)
  kube-system                kube-dns-autoscaler-69c5cbdcdd-rk7sd                            20m (2%)      0 (0%)      10Mi (0%)        0 (0%)
  kube-system                kube-proxy-gke-production-cluster-default-pool-0fd62aac-7kls    100m (10%)    0 (0%)      0 (0%)           0 (0%)

I suspect it wont downscale because heapster or kube-dns-autoscaler. But the 85% pod contains:

Namespace                  Name                                                            CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                                            ------------  ----------  ---------------  -------------
  kube-system                fluentd-gcp-v2.0.17-s25bk                                       100m (10%)    0 (0%)      200Mi (7%)       300Mi (11%)
  kube-system                kube-proxy-gke-production-cluster-default-pool-7ffeacff-mh6p    100m (10%)    0 (0%)      0 (0%)           0 (0%)
  my-deploy                  my-deploy-54fc6b67cf-7nklb                                      300m (31%)    0 (0%)      0 (0%)           0 (0%)
  my-deploy                  my-deploy-54fc6b67cf-zl7mr                                      300m (31%)    0 (0%)      0 (0%)           0 (0%)

The fluentd and kube-proxy pods are present on every node, so I assume they are not needed without the node. Which means that my deployment could be relocated to the other nodes since it only has a request of 300m (31% since only 94% of node CPU is allocatable).

So I figured that Ill check the logs. But if I run kubectl get pods --all-namespaces there are no pod visible on GKE for the CA. And if I use the command kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml it only tells me if it is about to scale, not why or why not. Another option is to look at /var/log/cluster-autoscaler.log in the master node. I SSH:ed in the all 4 nodes and only found a gcp-cluster-autoscaler.log.pos file that says: /var/log/cluster-autoscaler.log 0000000000000000 0000000000000000 meaning the file should be right there but is empty. Last option according to the FAQ, is to check the events for the pods, but as far as i can tell they are empty.

Anyone know why it wont downscale or atleast where to find the logs?

459

asked Jun 04 '18 11:06

Jim Theorell

2 Answers

Answering myself for visibility.

The problem is that the CA never considers moving anything unless all the requirements mentioned in the FAQ are met at the same time. So lets say I have 100 nodes with 51% CPU requests. It still wont consider downscaling.

One solution is to increase the value at which CA checks, now 50%. But unfortunately that is not supported by GKE, see answer from google support @GalloCedrone:

Moreover I know that this value might sound too low and someone could be interested to keep as well a 85% or 90% to avoid your scenario. Currently there is a feature request open to give the user the possibility to modify the flag "--scale-down-utilization-threshold", but it is not implemented yet.

The workaround I found is to decrease the CPU request (100m instead of 300m) of the pods and have the Horizontal Pod Autoscaler (HPA) create more on demand. This is fine for me but if your application is not suitable for many small instances you are out of luck. Perhaps a cron job that cordons a node if the total utilization is low?

answered Sep 23 '22 11:09

Jim Theorell

I agree that according to [Documentation][1] it seems that "gke-name-cluster-default-pool" could be safely deleted, conditions:

The sum of cpu and memory requests of all pods running on this node is smaller than 50% of the node's allocatable.
All pods running on the node (except these that run on all nodes by default, like manifest-run pods or pods created by DaemonSets) can be moved to other nodes.
It doesn't have scale-down disabled annotation Therefore there should remove it after 10 minutes it is considered not needed.

However checking the [Documentation][2] I found:

What types of pods can prevent CA from removing a node?

[...] Kube-system pods that are not run on the node by default, * [..]

heapster-v1.5.2--- is running on the node and it is a Kube-system pod that is not run on the node by default.

I will update the answer if I discover more interesting information.

UPDATE

The fact that the node it is the last one in the zone is not an issue.

To prove it I tested on a brand new cluster with 3 nodes each one in a different zone, one of them was without any workload apart from "kube-proxy" and "fluentd" and was correctly deleted even if it was bringing the size of the zone to zero. [1]: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md [2]: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node

answered Sep 21 '22 11:09

GalloCedrone

Related questions
                            
                                Kubernetes custom-columns select element from array
                            
                                Docker Kubernetes (Mac) - Autoscaler unable to find metrics
                            
                                Programmatically get the name of the pod that a container belongs to in Kubernetes?
                            
                                kubectl behind a proxy
                            
                                ingress configuration for dashboard
                            
                                kubernetes RBAC role verbs to exec to pod
                            
                                running windows Container in Kubernetes over AWS cloud
                            
                                Kube-state-metrics error: Failed to create client: ... i/o timeout
                            
                                What's the difference between Kubernetes-native and non-native applications?
                            
                                Correctly keeping docker VSTS / Azure Devops build agent clean yet cached
                            
                                endpoints “default-http-backend” not found in Ingress resource
                            
                                How to expose kube-dns service for queries outside cluster?
                            
                                Appropriate Kubernetes Readiness and Liveness Probes for Kestrel .NET Core Web API
                            
                                What causes pods to be slow in kubernetes?
                            
                                Using client-go to `kubectl apply` against the Kubernetes API directly with multiple types in a single YAML file
                            
                                How to create users/groups restricted to namespace in Kubernetes using RBAC API?
                            
                                How to troubleshoot thread starvation in ASP.NET Core on Linux (Kubernetes)?
                            
                                Kubernetes, simple SpringBoot app OOMKilled
                            
                                Create environment variables for Kubernetes main container in Kubernetes Init container
                            
                                Allowing access to a PersistentVolumeClaim to non-root user

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cluster autoscaler not downscaling

Tags:

google-cloud-platform

kubernetes

google-kubernetes-engine

Jim Theorell

People also ask

2 Answers

Jim Theorell

UPDATE

GalloCedrone

Recent Activity

Donate For Us