Kubernetes pods hanging in Init state

Tags:

I am facing a weird issue with my pods. I am launching around 20 pods in my env and every time some random 3-4 pods out of them hang with Init:0/1 status. On checking the status of pod, Init container shows running status, which should terminate after task is finished, and app container shows Waiting/Pod Initializing stage. Same init container image and specs are being used in across all 20 pods but this issue is happening with some random pods every time. And on terminating these stuck pods, it stucks in Terminating state. If i ssh on node at which this pod is launched and run docker ps, it shows me init container in running state but on running docker exec it throws error that container doesn't exist. This init container is pulling configs from Consul Server and on checking volume (got from docker inspect), i found that it has pulled all the key-val pairs correctly and saved it in defined file name. I have checked resources on all the nodes and more than enough is available on all.

Below is detailed example of on the pod acting like this.

Kubectl Version :

kubectl version 
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.0", GitCommit:"925c127ec6b946659ad0fd596fa959be43f0cc05", GitTreeState:"clean", BuildDate:"2017-12-15T21:07:38Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"} 
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T09:42:01Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}

Pods :

kubectl get pods -n dev1|grep -i session-service 
session-service-app-75c9c8b5d9-dsmhp               0/1       Init:0/1           0          10h 
session-service-app-75c9c8b5d9-vq98k               0/1       Terminating        0          11h

Pods Status :

kubectl describe pods session-service-app-75c9c8b5d9-dsmhp -n dev1 
Name:           session-service-app-75c9c8b5d9-dsmhp 
Namespace:      dev1 
Node:           ip-192-168-44-18.ap-southeast-1.compute.internal/192.168.44.18 
Start Time:     Fri, 27 Apr 2018 18:14:43 +0530 
Labels:         app=session-service-app 
                pod-template-hash=3175746185 
                release=session-service-app 
Status:         Pending 
IP:             100.96.4.240 
Controlled By:  ReplicaSet/session-service-app-75c9c8b5d9 
Init Containers: 
  initpullconsulconfig: 
    Container ID:  docker://c658d59995636e39c9d03b06e4973b6e32f818783a21ad292a2cf20d0e43bb02 
    Image:         shr-u-nexus-01.myops.de:8082/utils/app-init:1.0 
    Image ID:      docker-pullable://shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd 
    Port:          <none> 
    Args: 
      -consul-addr=consul-server.consul.svc.cluster.local:8500 
    State:          Running 
      Started:      Fri, 27 Apr 2018 18:14:44 +0530 
    Ready:          False 
    Restart Count:  0 
    Environment: 
      CONSUL_TEMPLATE_VERSION:  0.19.4 
      POD:                      sand 
      SERVICE:                  session-service-app 
      ENV:                      dev1 
    Mounts: 
      /var/lib/app from shared-volume-sidecar (rw) 
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro) 
Containers: 
  session-service-app: 
    Container ID: 
    Image:          shr-u-nexus-01.myops.de:8082/sand-images/sessionservice-init:sitv12 
    Image ID: 
    Port:           8080/TCP 
    State:          Waiting 
      Reason:       PodInitializing 
    Ready:          False 
    Restart Count:  0 
    Environment:    <none> 
    Mounts: 
      /etc/appenv from shared-volume-sidecar (rw) 
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bthkv (ro) 
Conditions: 
  Type           Status 
  Initialized    False 
  Ready          False 
  PodScheduled   True 
Volumes: 
  shared-volume-sidecar: 
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime) 
    Medium: 
  default-token-bthkv: 
    Type:        Secret (a volume populated by a Secret) 
    SecretName:  default-token-bthkv 
    Optional:    false 
QoS Class:       BestEffort 
Node-Selectors:  <none> 
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s 
                 node.kubernetes.io/unreachable:NoExecute for 300s 
Events:          <none>

Container Status on Node :

sudo docker ps|grep -i session 
c658d5999563        shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd                                       "/usr/bin/consul-t..."   10 hours ago        Up 10 hours                             k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0 

da120abd3dbb        gcr.io/google_containers/pause-amd64:3.0                                                                                                                      "/pause"                 10 hours ago        Up 10 hours                             k8s_POD_session-service-app-75c9c8b5d9-dsmhp_dev1_c2075f2a-4a18-11e8-88e7-02929cc89ab6_0 

f53d48c7d6ec        shr-u-nexus-01.myops.de:8082/utils/app-init@sha256:7b0692e3f2e96c6e54c2da614773bb860305b79922b79642642c4e76bd5312cd                                       "/usr/bin/consul-t..."   10 hours ago        Up 10 hours                             k8s_initpullconsulconfig_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0 

c26415458d39        gcr.io/google_containers/pause-amd64:3.0                                                                                                                      "/pause"                 10 hours ago        Up 10 hours                             k8s_POD_session-service-app-75c9c8b5d9-vq98k_dev1_42837d12-4a12-11e8-88e7-02929cc89ab6_0

On running Docker exec (same result with kubectl exec) :

sudo docker exec -it c658d5999563 bash 
rpc error: code = 2 desc = containerd: container not found

589

asked Apr 28 '18 10:04

Vivek Kumar

Video Answer

1 Answers

A Pod can be stuck in Init status due to many reasons.

PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts). If the pods status is ´Init:0/1´ means one init container is not finalized; init:N/M means the Pod has M Init Containers, and N have completed so far.

Architecture

Gathering information

For those scenario the best would be to gather information, as the root cause can be different in every PodInitializing issue.

kubectl describe pods pod-XXX with this command you can get many info of the pod, you can check if there's any meaningful event as well. Save the init container name
kubectl logs pod-XXX this command prints the logs for a container in a pod or specified resource.
kubectl logs pod-XXX -c init-container-xxx This is the most accurate as could print the logs of the init container. You can get the init container name describing the pod in order to replace "init-container-XXX" as for example to "copy-default-config" as below:

The output of kubectl logs pod-XXX -c init-container-xxx can thrown meaningful info of the issue, reference:

In the image above we can see that the root cause is that the init container can't download the plugins from jenkins (timeout), here now we can check connection config, proxy, dns; or just modify the yaml to deploy the container without the plugins.

Additional:

kubectl describe node node-XXX describing the pod will give you the name of its node, which you can also inspect with this command.
kubectl get events to list the cluster events.
journalctl -xeu kubelet | tail -n 10 kubelet logs on systemd (journalctl -xeu docker | tail -n 1 for docker).

Solutions

The solutions depends on the information gathered, once the root cause is found.

When you find a log with an insight of the root cause, you can investigate that specific root cause.

Some examples:

1 > In there this happened when init container was deleted, can be fixed deleting the pod so it would be recreated, or redeploy it. Same scenario in 1.1.

2 > If you found "bad address 'kube-dns.kube-system'" the PVC may not be recycled correctly, solution provided in 2 is running /opt/kubernetes/bin/kube-restart.sh.

3 > There, a sh file was not found, the solution would be to modify the yaml file or remove the container if unnecessary.

4 > A FailedSync was found, and it was solved restarting docker on the node.

In general you can modify the yaml, for example to avoid using an outdated URL, try to recreate the affected resource, or just remove the init container that causes the issue from your deployment. However the specific solution will depend on the specific root cause.

173

answered Oct 11 '22 07:10

Toni

Related questions
                            
                                Why docker container is consuming lot of memory?
                            
                                How to configure a Docker container for acquiring DHCP IP/s from dhcp server running on ESX
                            
                                How to run Consul on docker with initial key-value pair data?
                            
                                Some exposed Docker ports are not accessible from outside (dovecot, postfix)
                            
                                Unable to get --cache-from to work
                            
                                Multistage builds with yum
                            
                                Docker on Windows File Sharing Blocked by Firewall
                            
                                `npm install` results in `cb() never called!` when running in Docker
                            
                                Docker networking only works with --net=host
                            
                                Clean AND practical way to handle node_modules in a Dockerized Node.js dev environment?
                            
                                How to deploy mongoDB Docker image to Elastic Beanstalk?
                            
                                What are the best practices to deploy and host artifacts for a Docker Multicontainer environment in Elasticbeanstalk for Scala Apps?
                            
                                jekyll not updating static CSS, HTML files in docker development container
                            
                                Running docker-compose on a docker gitlab-ci-multi-runner
                            
                                How to connect django to docker redis container?
                            
                                How can I see which file(s) caused a Dockerfile `COPY` statement to invalidate the cache?
                            
                                Can't connect to mongodb in the docker container
                            
                                Docker exiting with status code 139
                            
                                webpack-dev-server polling inside docker container - heavy CPU usage
                            
                                Docker at Windows 10 proxy propagation to containers not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Kubernetes pods hanging in Init state

Tags:

docker

kubernetes