Is there a 'max-retries' for Kubernetes Jobs?

Question

I have batch jobs that I want to run on Kubernetes. The way I understand Jobs:

If I choose restartPolicy: Never it means that if the Job fails, it will destroy the Pod and reschedule onto (potentially) another node. If restartPolicy: OnFailure, it will restart the container in the existing Pod. I'd consider a certain number of failures unrecoverable. Is there a way I can prevent it from rescheduling or restarting after a certain period of time and cleanup the unrecoverable Jobs?

My current thought for a workaround to this is to have some watchdog process that looks at retryTimes and cleans up Jobs after a specified number of retries.

briangrant · Accepted Answer

Summary of slack discussion:

No, there is no retry limit. However, you can set a deadline on the job as of v1.2 with activeDeadlineSeconds. The system should back off restarts and then terminate the job when it hits the deadline.

Dave Koston · Answer

FYI, this has now been added as .spec.backoffLimit.

https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

Is there a 'max-retries' for Kubernetes Jobs?

Tags:

google-compute-engine

kubernetes

alph486

2 Answers

briangrant

Dave Koston

Recent Activity

Donate For Us

Is there a 'max-retries' for Kubernetes Jobs?

Tags:

google-compute-engine

kubernetes

alph486

2 Answers

briangrant

Dave Koston

Related questions

Recent Activity

Donate For Us