Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes pods failing on "Pod sandbox changed, it will be killed and re-created"

On a Google Container Engine cluster (GKE), I see sometimes a pod (or more) not starting and looking in its events, I can see the following

Pod sandbox changed, it will be killed and re-created.

If I wait - it just keeps re-trying.
If I delete the pod, and allow them to be recreated by the Deployment's Replica Set, it will start properly.

The behavior is inconsistent.

Kubernetes versions 1.7.6 and 1.7.8

Any ideas?

like image 240
Eldad Assis Avatar asked Oct 19 '17 08:10

Eldad Assis


People also ask

What happens when a Kubernetes pod fails?

If a Pod is scheduled to a node that then fails, the Pod is deleted; likewise, a Pod won't survive an eviction due to a lack of resources or Node maintenance.

Does Kubernetes restart pods?

A pod is the smallest unit in Kubernetes (K8S). They should run until they are replaced by a new deployment. Because of this, there is no way to restart a pod, instead, it should be replaced.

What causes a Kubernetes pod to restart?

A restarting container can indicate problems with memory (see the Out of Memory section), cpu usage, or just an application exiting prematurely. If a container is being restarted because of CPU usage, try increasing the requested and limit amounts for CPU in the pod spec.


2 Answers

In my case it happened because of too little memory and CPU limits

like image 52
Gilad Sharaby Avatar answered Sep 18 '22 12:09

Gilad Sharaby


I can see following message posted in Google Cloud Status Dashboard:

"We are investigating an issue affecting Google Container Engine (GKE) clusters where after docker crashes or is restarted on a node, pods are unable to be scheduled.

The issue is believed to be affecting all GKE clusters running Kubernetes v1.6.11, v1.7.8 and v1.8.1.

Our Engineering Team suggests: If nodes are on release v1.6.11, please downgrade your nodes to v1.6.10. If nodes are on release v1.7.8, please downgrade your nodes to v1.7.6. If nodes are on v1.8.1, please downgrade your nodes to v1.7.6.

Alternative workarounds are also provided by the Engineering team in this doc . These workarounds are applicable to the customers that are unable to downgrade their nodes."

like image 43
Carlos Avatar answered Sep 19 '22 12:09

Carlos