Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes reports "pod didn't trigger scale-up (it wouldn't fit if a new node is added)" even though it would?

I don't understand why I'm receiving this error. A new node should definitely be able to accommodate the pod. As I'm only requesting 768Mi of memory and 450m of CPU, and the instance group that would be autoscaled is of type n1-highcpu-2 - 2 vCPU, 1.8GB.

How could I diagnose this further?

kubectl describe pod:

Name:           initial-projectinitialabcrad-697b74b449-848bl
Namespace:      production
Node:           <none>
Labels:         app=initial-projectinitialabcrad
                appType=abcrad-api
                pod-template-hash=2536306005
Annotations:    <none>
Status:         Pending
IP:             
Controlled By:  ReplicaSet/initial-projectinitialabcrad-697b74b449
Containers:
  app:
    Image:      gcr.io/example-project-abcsub/projectinitial-abcrad-app:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     250m
      memory:  512Mi
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
  nginx:
    Image:      gcr.io/example-project-abcsub/projectinitial-abcrad-nginx:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
    Port:       80/TCP
    Host Port:  0/TCP
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Readiness:    http-get http://:80/api/v1/ping delay=5s timeout=10s period=10s #success=1 #failure=3
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
  cloudsql-proxy:
    Image:      gcr.io/cloudsql-docker/gce-proxy:1.11
    Port:       3306/TCP
    Host Port:  0/TCP
    Command:
      /cloud_sql_proxy
      -instances=example-project-abcsub:us-central1:abcfn-staging=tcp:0.0.0.0:3306
      -credential_file=/secrets/cloudsql/credentials.json
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Mounts:
      /secrets/cloudsql from cloudsql-instance-credentials (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  cloudsql-instance-credentials:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cloudsql-instance-credentials
    Optional:    false
  default-token-srv8k:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-srv8k
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Normal   NotTriggerScaleUp  4m (x29706 over 3d)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)
  Warning  FailedScheduling   4m (x18965 over 3d)  default-scheduler   0/4 nodes are available: 3 Insufficient memory, 4 Insufficient cpu.
like image 940
Chris Stryczynski Avatar asked Jan 22 '19 18:01

Chris Stryczynski


People also ask

Is it possible to auto scale Kubernetes worker nodes?

Kubernetes supports autoscaling for scaling up your cluster of worker nodes and application pods when you need it. It can also scale it down to save money.

Do all containers in a pod run on the same node?

The key thing about pods is that when a pod does contain multiple containers, all of them are always run on a single worker node—it never spans multiple worker nodes, as shown in figure 3.1.

What happens when a Kubernetes node goes down?

Failure Impact of a Down Kubernetes Cluster Even when the master node goes down, worker nodes may continue to operate and run the containers orchestrated on those nodes. If certain applications or pods were running on those master nodes, those applications and pods will go down.


2 Answers

It's not the hardware requests (confusingly the error message made me assume this) but it's due to my pod affinity rule defined:

podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: appType
        operator: NotIn
        values:
        - example-api
    topologyKey: kubernetes.io/hostname
like image 114
Chris Stryczynski Avatar answered Oct 11 '22 11:10

Chris Stryczynski


If you are using K8s from a cloud provider like GKE/EKS, maybe it is worth taking a look at the cloud provider resource quota!

Even everything looks reasonable, K8s gave the same error "pod didn't trigger scale-up"! And that was because the CPU quota was exhausted! (K8s has nothing to do with that limitation, so the error is not clear from K8s side).

like image 37
Ahmed AbouZaid Avatar answered Oct 11 '22 09:10

Ahmed AbouZaid