Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pods stuck in PodInitializing state indefinitely

I've got a k8s cronjob that consists of an init container and a one pod container. If the init container fails, the Pod in the main container never gets started, and stays in "PodInitializing" indefinitely.

My intent is for the job to fail if the init container fails.

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: job-name
  namespace: default
  labels:
    run: job-name
spec:
  schedule: "15 23 * * *"
  startingDeadlineSeconds: 60
  concurrencyPolicy: "Forbid"
  successfulJobsHistoryLimit: 30
  failedJobsHistoryLimit: 10
  jobTemplate:
    spec:
      # only try twice
      backoffLimit: 2
      activeDeadlineSeconds: 60
      template:
        spec:
          initContainers:
          - name: init-name
            image: init-image:1.0
          restartPolicy: Never
          containers:
          - name: some-name
            image: someimage:1.0
          restartPolicy: Never

a kubectl on the pod that's stuck results in:

Name:               job-name-1542237120-rgvzl
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               my-node-98afffbf-0psc/10.0.0.0
Start Time:         Wed, 14 Nov 2018 23:12:16 +0000
Labels:             controller-uid=ID
                    job-name=job-name-1542237120
Annotations:        kubernetes.io/limit-ranger:
                      LimitRanger plugin set: cpu request for container elasticsearch-metrics; cpu request for init container elasticsearch-repo-setup; cpu requ...
Status:             Failed
IP:                 10.0.0.0
Controlled By:      Job/job-1542237120
Init Containers:
init-container-name:
    Container ID:  docker://ID
    Image:         init-image:1.0
    Image ID:      init-imageID
    Port:          <none>
    Host Port:     <none>
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 14 Nov 2018 23:12:21 +0000
      Finished:     Wed, 14 Nov 2018 23:12:32 +0000
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-wwl5n (ro)
Containers:
  some-name:
    Container ID:  
    Image:         someimage:1.0
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-wwl5n (ro)
Conditions:
  Type              Status
  Initialized       False 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True
like image 594
Anderson Avatar asked Nov 15 '18 07:11

Anderson


People also ask

What does PodInitializing mean?

PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts).

Why are pods in pending state?

If a Pod is stuck in Pending it means that it can not be scheduled onto a node. Generally this is because there are insufficient resources of one type or another that prevent scheduling.


3 Answers

To try and figure this out I would run the command:

kubectl get pods - Add the namespace param if required.

Then copy the pod name and run:

kubectl describe pod {POD_NAME}

That should give you some information as to why it's stuck in the initializing state.

like image 72
ajtrichards Avatar answered Oct 06 '22 15:10

ajtrichards


A Pod can be stuck in Init status due to many reasons.

PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts). If the pods status is ´Init:0/1´ means that one init container is not finalized; init:N/M means the Pod has M Init Containers, and N have completed so far.

Architecture



Gathering information

For those scenario the best would be to gather information, as the root cause can be different in every PodInitializing issue.

  • kubectl describe pods pod-XXX with this command you can get many info of the pod, you can check if there's any meaningful event as well. Save the init container name

  • kubectl logs pod-XXX this command prints the logs for a container in a pod or specified resource.

  • kubectl logs pod-XXX -c init-container-xxx This is the most accurate as could print the logs of the init container. You can get the init container name describing the pod in order to replace "init-container-XXX" as for example to "copy-default-config" as below:

    enter image description here

    The output of kubectl logs pod-XXX -c init-container-xxx can thrown meaningful info of the issue, reference:

    Image-logs

    In the image above we can see that the root cause is that the init container can't download the plugins from jenkins (timeout), here now we can check connection config, proxy, dns; or just modify the yaml to deploy the container without the plugins.

Additional:

  • kubectl describe node node-XXX describing the pod will give you the name of its node, which you can also inspect with this command.

  • kubectl get events to list the cluster events.

  • journalctl -xeu kubelet | tail -n 10 kubelet logs on systemd (journalctl -xeu docker | tail -n 1 for docker).


Solutions

The solutions depends on the information gathered, once the root cause is found.

When you find a log with an insight of the root cause, you can investigate that specific root cause.

Some examples:

1 > In there this happened when init container was deleted, can be fixed deleting the pod so it would be recreated, or redeploy it. Same scenario in 1.1.

2 > If you found "bad address 'kube-dns.kube-system'" the PVC may not be recycled correctly, solution provided in 2 is running /opt/kubernetes/bin/kube-restart.sh.

3 > There, a sh file was not found, the solution would be to modify the yaml file or remove the container if unnecessary.

4 > A FailedSync was found, and it was solved restarting docker on the node.

In general you can modify the yaml, for example to avoid using an outdated URL, try to recreate the affected resource, or just remove the init container that causes the issue from your deployment. However the specific solution will depend on the specific root cause.

like image 34
Toni Avatar answered Oct 06 '22 15:10

Toni


I think that you could miss that it is the expected behavior of the init containers. The rule is that in case of initContainers failure a Pod will not restart if restartPolicy is set to Never otherwise the Kubernetes will keep restarting it until it succeeds.

Also:

If the init container fails, the Pod in the main container never gets started, and stays in "PodInitializing" indefinitely.

According to documentation:

A Pod cannot be Ready until all Init Containers have succeeded. The ports on an Init Container are not aggregated under a service. A Pod that is initializing is in the Pending state but should have a condition Initializing set to true.

*I can see that you tried to change this behavior, but I am not sure if you can do that with CronJob, I saw examples with Jobs. But I am just theorizing, and if this post did not help you solve your issue I can try to recreate it in lab environment.

like image 6
aurelius Avatar answered Oct 06 '22 14:10

aurelius