I've got a k8s cronjob that consists of an init container and a one pod container. If the init container fails, the Pod in the main container never gets started, and stays in "PodInitializing" indefinitely.
My intent is for the job to fail if the init container fails.
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: job-name
namespace: default
labels:
run: job-name
spec:
schedule: "15 23 * * *"
startingDeadlineSeconds: 60
concurrencyPolicy: "Forbid"
successfulJobsHistoryLimit: 30
failedJobsHistoryLimit: 10
jobTemplate:
spec:
# only try twice
backoffLimit: 2
activeDeadlineSeconds: 60
template:
spec:
initContainers:
- name: init-name
image: init-image:1.0
restartPolicy: Never
containers:
- name: some-name
image: someimage:1.0
restartPolicy: Never
a kubectl on the pod that's stuck results in:
Name: job-name-1542237120-rgvzl
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: my-node-98afffbf-0psc/10.0.0.0
Start Time: Wed, 14 Nov 2018 23:12:16 +0000
Labels: controller-uid=ID
job-name=job-name-1542237120
Annotations: kubernetes.io/limit-ranger:
LimitRanger plugin set: cpu request for container elasticsearch-metrics; cpu request for init container elasticsearch-repo-setup; cpu requ...
Status: Failed
IP: 10.0.0.0
Controlled By: Job/job-1542237120
Init Containers:
init-container-name:
Container ID: docker://ID
Image: init-image:1.0
Image ID: init-imageID
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 14 Nov 2018 23:12:21 +0000
Finished: Wed, 14 Nov 2018 23:12:32 +0000
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwl5n (ro)
Containers:
some-name:
Container ID:
Image: someimage:1.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-wwl5n (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts).
If a Pod is stuck in Pending it means that it can not be scheduled onto a node. Generally this is because there are insufficient resources of one type or another that prevent scheduling.
To try and figure this out I would run the command:
kubectl get pods
- Add the namespace param if required.
Then copy the pod name and run:
kubectl describe pod {POD_NAME}
That should give you some information as to why it's stuck in the initializing state.
A Pod can be stuck in Init status due to many reasons.
PodInitializing or Init Status means that the Pod contains an Init container that hasn't finalized (Init containers: specialized containers that run before app containers in a Pod, init containers can contain utilities or setup scripts). If the pods status is ´Init:0/1´ means that one init container is not finalized; init:N/M
means the Pod has M Init Containers, and N have completed so far.
For those scenario the best would be to gather information, as the root cause can be different in every PodInitializing issue.
kubectl describe pods pod-XXX
with this command you can get many info of the pod, you can check if there's any meaningful event as well. Save the init container name
kubectl logs pod-XXX
this command prints the logs for a container in a pod or specified resource.
kubectl logs pod-XXX -c init-container-xxx
This is the most accurate as could print the logs of the init container. You can get the init container name describing the pod in order to replace "init-container-XXX" as for example to "copy-default-config" as below:
The output of kubectl logs pod-XXX -c init-container-xxx
can thrown meaningful info of the issue, reference:
In the image above we can see that the root cause is that the init container can't download the plugins from jenkins (timeout), here now we can check connection config, proxy, dns; or just modify the yaml to deploy the container without the plugins.
Additional:
kubectl describe node node-XXX
describing the pod will give you the name of its node, which you can also inspect with this command.
kubectl get events
to list the cluster events.
journalctl -xeu kubelet | tail -n 10
kubelet logs on systemd (journalctl -xeu docker | tail -n 1
for docker).
Solutions
The solutions depends on the information gathered, once the root cause is found.
When you find a log with an insight of the root cause, you can investigate that specific root cause.
Some examples:
1 > In there this happened when init container was deleted, can be fixed deleting the pod so it would be recreated, or redeploy it. Same scenario in 1.1.
2 > If you found "bad address 'kube-dns.kube-system'" the PVC may not be recycled correctly, solution provided in 2 is running /opt/kubernetes/bin/kube-restart.sh
.
3 > There, a sh file was not found, the solution would be to modify the yaml file or remove the container if unnecessary.
4 > A FailedSync was found, and it was solved restarting docker on the node.
In general you can modify the yaml, for example to avoid using an outdated URL, try to recreate the affected resource, or just remove the init container that causes the issue from your deployment. However the specific solution will depend on the specific root cause.
I think that you could miss that it is the expected behavior of the init containers. The rule is that in case of initContainers failure a Pod will not restart if restartPolicy is set to Never otherwise the Kubernetes will keep restarting it until it succeeds.
Also:
If the init container fails, the Pod in the main container never gets started, and stays in "PodInitializing" indefinitely.
According to documentation:
A Pod cannot be Ready until all Init Containers have succeeded. The ports on an Init Container are not aggregated under a service. A Pod that is initializing is in the Pending state but should have a condition Initializing set to true.
*I can see that you tried to change this behavior, but I am not sure if you can do that with CronJob, I saw examples with Jobs. But I am just theorizing, and if this post did not help you solve your issue I can try to recreate it in lab environment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With