I’ve created a Cronjob
in kubernetes with schedule(8 * * * *
), with job’s backoffLimit
defaulting to 6 and pod’s RestartPolicy
to Never
, the pods are deliberately configured to FAIL. As I understand, (for podSpec with restartPolicy : Never
) Job controller will try to create backoffLimit
number of pods and then it marks the job as Failed
, so, I expected that there would be 6 pods in Error
state.
This is the actual Job’s status:
status: conditions: - lastProbeTime: 2019-02-20T05:11:58Z lastTransitionTime: 2019-02-20T05:11:58Z message: Job has reached the specified backoff limit reason: BackoffLimitExceeded status: "True" type: Failed failed: 5
Why were there only 5 failed pods instead of 6? Or is my understanding about backoffLimit
in-correct?
backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s ...) capped at six minutes.
Overview. CronJobs create Kubernetes Jobs on a repeating schedule. CronJobs allow you to automate regular tasks like making backups, creating reports, sending emails, or cleanup tasks. CronJobs are created, managed, scaled, and deleted in the same way as Jobs.
The main function of a job is to create one or more pod and tracks about the success of pods. They ensure that the specified number of pods are completed successfully. When a specified number of successful run of pods is completed, then the job is considered complete.
In short: You might not be seeing all created pods because period of schedule in the cronjob is too short.
As described in documentation:
Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes. The back-off count is reset if no new failed Pods appear before the Job’s next status check.
If new job is scheduled before Job controller has a chance to recreate a pod (having in mind the delay after previous failure), Job controller starts counting from one again.
I reproduced your issue in GKE using following .yaml
:
apiVersion: batch/v1beta1 kind: CronJob metadata: name: hellocron spec: schedule: "*/3 * * * *" #Runs every 3 minutes jobTemplate: spec: template: spec: containers: - name: hellocron image: busybox args: - /bin/cat - /etc/os restartPolicy: Never backoffLimit: 6 suspend: false
This job will fail because file /etc/os
doesn't exist.
And here is an output of kubectl describe
for one of the jobs:
Name: hellocron-1551194280 Namespace: default Selector: controller-uid=b81cdfb8-39d9-11e9-9eb7-42010a9c00d0 Labels: controller-uid=b81cdfb8-39d9-11e9-9eb7-42010a9c00d0 job-name=hellocron-1551194280 Annotations: <none> Controlled By: CronJob/hellocron Parallelism: 1 Completions: 1 Start Time: Tue, 26 Feb 2019 16:18:07 +0100 Pods Statuses: 0 Running / 0 Succeeded / 6 Failed Pod Template: Labels: controller-uid=b81cdfb8-39d9-11e9-9eb7-42010a9c00d0 job-name=hellocron-1551194280 Containers: hellocron: Image: busybox Port: <none> Host Port: <none> Args: /bin/cat /etc/os Environment: <none> Mounts: <none> Volumes: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 26m job-controller Created pod: hellocron-1551194280-4lf6h Normal SuccessfulCreate 26m job-controller Created pod: hellocron-1551194280-85khk Normal SuccessfulCreate 26m job-controller Created pod: hellocron-1551194280-wrktb Normal SuccessfulCreate 26m job-controller Created pod: hellocron-1551194280-6942s Normal SuccessfulCreate 25m job-controller Created pod: hellocron-1551194280-662zv Normal SuccessfulCreate 22m job-controller Created pod: hellocron-1551194280-6c6rh Warning BackoffLimitExceeded 17m job-controller Job has reached the specified backoff limit
Note the delay between creation of pods hellocron-1551194280-662zv
and hellocron-1551194280-6c6rh
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With