Multiple pods of a 600 pod deployment stuck in <code>ContainerCreating</code> after a rolling update with the message: <blockquote> Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod network: add cmd: failed to assign an IP address to container </blockquote> What I have tried: <ul> <li>Upgraded to v1.12 on EKS and CNI 1.5.0. This issue was closed stating CNI 1.5.0 solved the issue. It did not for us. In another thread leaking ENIs was blamed but was also closed due to CNI upgrade.</li> <li> Installed cni-metrics-helper and this is a snapshot of the output:</li> </ul> <pre class="prettyprint"><code>maxIPAddresses, value: 759.000000 ipamdActionInProgress, value: 1.000000 addReqCount, value: 16093.000000 awsAPILatency, value: 564.000000 delReqCount, value: 32337.000000 eniMaxAvailable, value: 69.000000 assignIPAddresses, value: 558.000000 totalIPAddresses, value: 682.000000 eniAllocated, value: 69.000000 </code></pre> Do the CNI metrics output suggest there's an issue? Seems like there are enough IPs. What else can I try to debug?

It seems that you reached maximum number of IP addresses in your subnet what can suggest such thing in documentation: <blockquote> maxIPAddress: the maximum number of IP addresses that can be used for Pods in the cluster. (assumes there is enough IPs in the subnet). </blockquote> Please take a look also on maxUnavailable and maxSurge parameters which controls how many PODs appear during rolling upgrade - maybe your configuration assumes that during rolling upgrade you will have over 600 PODs (like 130%) and that hit limits of your AWS network.

Pods stuck in ContainerCreating with "failed to assign an IP address to container"

Tags:

kubernetes

amazon-eks

Multiple pods of a 600 pod deployment stuck in ContainerCreating after a rolling update with the message:

Failed create pod sandbox: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod network: add cmd: failed to assign an IP address to container

What I have tried:

Upgraded to v1.12 on EKS and CNI 1.5.0. This issue was closed stating CNI 1.5.0 solved the issue. It did not for us. In another thread leaking ENIs was blamed but was also closed due to CNI upgrade.
Installed cni-metrics-helper and this is a snapshot of the output:

maxIPAddresses, value: 759.000000
ipamdActionInProgress, value: 1.000000
addReqCount, value: 16093.000000
awsAPILatency, value: 564.000000
delReqCount, value: 32337.000000
eniMaxAvailable, value: 69.000000
assignIPAddresses, value: 558.000000
totalIPAddresses, value: 682.000000
eniAllocated, value: 69.000000

Do the CNI metrics output suggest there's an issue? Seems like there are enough IPs.

What else can I try to debug?

444

asked Mar 07 '19 00:03

ProGirlXOXO

1 Answers

It seems that you reached maximum number of IP addresses in your subnet what can suggest such thing in documentation:

maxIPAddress: the maximum number of IP addresses that can be used for Pods in the cluster. (assumes there is enough IPs in the subnet).

Please take a look also on maxUnavailable and maxSurge parameters which controls how many PODs appear during rolling upgrade - maybe your configuration assumes that during rolling upgrade you will have over 600 PODs (like 130%) and that hit limits of your AWS network.

187

answered Sep 29 '22 22:09

Jakub Bujny

Related questions
                            
                                Whats the difference between "Pods" and "Static Pods" in kubernetes and when to choose "static pods" over regular pods
                            
                                kubectl: Use custom-columns output with maps
                            
                                Access other containers of a pod in Kubernetes
                            
                                How to remove Kubectl from ubuntu 16.04 LTS
                            
                                How to watch events on a kubernetes service using its go client
                            
                                uninstall: Release not loaded: new: release: not found, chart deployed using helm 3
                            
                                Access to Mongodb in Kubernetes
                            
                                Kubernetes REST API
                            
                                Volume mounts not working Kubernetes and WSL 2 and Docker
                            
                                How to get logs from kubernetes using Go?
                            
                                Can change clusterip to nodeport command line without editor?
                            
                                How do I run a Kubernetes Job once in N hours?
                            
                                Kubernetes CPU throttling with CPU usage well below requests/limits
                            
                                What are all the "The resourceVersion for the provided watch is too old" warnings from event-exporter container?
                            
                                How to debug failed requests with client_disconnected_before_any_response
                            
                                replica set config is invalid or we are not a member of it, running in kubernetes
                            
                                Istio(0.7.1) : Circuit Breaker Doesn't work for httpConsecutiveErrors
                            
                                How to customise config.toml on Kubernetes?
                            
                                How to make kubernetes work with dynamic ip address
                            
                                Ingress controller to route TCP traffic

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With