Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fail a (cron) job after a certain number of retries?

Tags:

kubernetes

We have a Kubernetes cluster of web scraping cron jobs set up. All seems to go well until a cron job starts to fail (e.g., when a site structure changes and our scraper no longer works). It looks like every now and then a few failing cron jobs will continue to retry to the point it brings down our cluster. Running kubectl get cronjobs (prior to a cluster failure) will show too many jobs running for a failing job.

I've attempted following the note described here regarding a known issue with the pod backoff failure policy; however, that does not seem to work.

Here is our config for reference:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: scrape-al
spec:
  schedule: '*/15 * * * *'
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 0
  successfulJobsHistoryLimit: 0
  jobTemplate:
    metadata:
      labels:
        app: scrape
        scrape: al
    spec:
      template:
        spec:
          containers:
            - name: scrape-al
              image: 'govhawk/openstates:1.3.1-beta'
              command:
                - /opt/openstates/openstates/pupa-scrape.sh
              args:
                - al bills --scrape
          restartPolicy: Never
      backoffLimit: 3

Ideally we would prefer that a cron job would be terminated after N retries (e.g., something like kubectl delete cronjob my-cron-job after my-cron-job has failed 5 times). Any ideas or suggestions would be much appreciated. Thanks!

like image 848
doubleswirve Avatar asked Jan 29 '18 16:01

doubleswirve


People also ask

What is * * * * * In cron job?

What does * mean in Cron? The asterisk * is used as a wildcard in Cron. * sets the execution of a task to any minute, hour, day, weekday, or month.

Can a cron job fail?

Cron tasks may fail due to resource depletion. These tasks may run out of disk space or a lack of available space may cause an operating system to be unable to start new threads. There are various ways to check how much disk space is used and what it's used by.

How do you handle a cron job failure?

Create a some kind of scheduler where you can write your CRON job again if it fails, in this case you will need one more CRON job to read you scheduler and run proper command. Scheduler can be database / file / NoSQL based.

How do I stop a cron schedule?

You can stop a single cron job by removing its line from the crontab file. To do that, run the crontab -e command and then delete the line for the specific task. Alternatively, you can stop the cron job by commenting it out in the crontab file.


1 Answers

You can tell your Job to stop retrying using backoffLimit.

Specifies the number of retries before marking this job failed.

In your case

spec:
  template:
    spec:
      containers:
        - name: scrape-al
          image: 'govhawk/openstates:1.3.1-beta'
          command:
            - /opt/openstates/openstates/pupa-scrape.sh
          args:
            - al bills --scrape
      restartPolicy: Never
  backoffLimit: 3

You set 3 asbackoffLimit of your Job. That means when a Job is created by CronJob, It will retry 3 times if fails. This controls Job, not CronJob

When Job is failed, another Job will be created again as your scheduled period.

You want: If I am not wrong, you want to stop scheduling new Job, when your scheduled Jobs are failed for 5 times. Right?

Answer: In that case, this is not possible automatically.

Possible solution: You need to suspend CronJob so than it stop scheduling new Job.

Suspend: true

You can do this manually. If you do not want to do this manually, you need to setup a watcher, that will watch your CronJob status, and will update CronJob to suspend if necessary.

like image 70
Shahriar Avatar answered Sep 19 '22 07:09

Shahriar