Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a 'max-retries' for Kubernetes Jobs?

I have batch jobs that I want to run on Kubernetes. The way I understand Jobs:

If I choose restartPolicy: Never it means that if the Job fails, it will destroy the Pod and reschedule onto (potentially) another node. If restartPolicy: OnFailure, it will restart the container in the existing Pod. I'd consider a certain number of failures unrecoverable. Is there a way I can prevent it from rescheduling or restarting after a certain period of time and cleanup the unrecoverable Jobs?

My current thought for a workaround to this is to have some watchdog process that looks at retryTimes and cleans up Jobs after a specified number of retries.

like image 692
alph486 Avatar asked Feb 01 '16 22:02

alph486


2 Answers

Summary of slack discussion:

No, there is no retry limit. However, you can set a deadline on the job as of v1.2 with activeDeadlineSeconds. The system should back off restarts and then terminate the job when it hits the deadline.

like image 192
briangrant Avatar answered Sep 30 '22 12:09

briangrant


FYI, this has now been added as .spec.backoffLimit.

https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

like image 27
Dave Koston Avatar answered Sep 30 '22 10:09

Dave Koston