I don't understand why I'm receiving this error. A new node should definitely be able to accommodate the pod. As I'm only requesting 768Mi of memory and 450m of CPU, and the instance group that would be autoscaled is of type n1-highcpu-2
- 2 vCPU, 1.8GB.
How could I diagnose this further?
kubectl describe pod:
Name: initial-projectinitialabcrad-697b74b449-848bl
Namespace: production
Node: <none>
Labels: app=initial-projectinitialabcrad
appType=abcrad-api
pod-template-hash=2536306005
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/initial-projectinitialabcrad-697b74b449
Containers:
app:
Image: gcr.io/example-project-abcsub/projectinitial-abcrad-app:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
Port: <none>
Host Port: <none>
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 250m
memory: 512Mi
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
nginx:
Image: gcr.io/example-project-abcsub/projectinitial-abcrad-nginx:production_6b0b3ddabc68d031e9f7874a6ea49ee9902207bc
Port: 80/TCP
Host Port: 0/TCP
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:80/api/v1/ping delay=5s timeout=10s period=10s #success=1 #failure=3
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
cloudsql-proxy:
Image: gcr.io/cloudsql-docker/gce-proxy:1.11
Port: 3306/TCP
Host Port: 0/TCP
Command:
/cloud_sql_proxy
-instances=example-project-abcsub:us-central1:abcfn-staging=tcp:0.0.0.0:3306
-credential_file=/secrets/cloudsql/credentials.json
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Mounts:
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-srv8k (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
default-token-srv8k:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-srv8k
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NotTriggerScaleUp 4m (x29706 over 3d) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
Warning FailedScheduling 4m (x18965 over 3d) default-scheduler 0/4 nodes are available: 3 Insufficient memory, 4 Insufficient cpu.
Kubernetes supports autoscaling for scaling up your cluster of worker nodes and application pods when you need it. It can also scale it down to save money.
The key thing about pods is that when a pod does contain multiple containers, all of them are always run on a single worker node—it never spans multiple worker nodes, as shown in figure 3.1.
Failure Impact of a Down Kubernetes Cluster Even when the master node goes down, worker nodes may continue to operate and run the containers orchestrated on those nodes. If certain applications or pods were running on those master nodes, those applications and pods will go down.
It's not the hardware requests (confusingly the error message made me assume this) but it's due to my pod affinity rule defined:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: appType
operator: NotIn
values:
- example-api
topologyKey: kubernetes.io/hostname
If you are using K8s from a cloud provider like GKE/EKS, maybe it is worth taking a look at the cloud provider resource quota!
Even everything looks reasonable, K8s gave the same error "pod didn't trigger scale-up"! And that was because the CPU quota was exhausted! (K8s has nothing to do with that limitation, so the error is not clear from K8s side).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With