We have a Kubernetes cluster of web scraping cron jobs set up. All seems to go well until a cron job starts to fail (e.g., when a site structure changes and our scraper no longer works). It looks like every now and then a few failing cron jobs will continue to retry to the point it brings down our cluster. Running kubectl get cronjobs
(prior to a cluster failure) will show too many jobs running for a failing job.
I've attempted following the note described here regarding a known issue with the pod backoff failure policy; however, that does not seem to work.
Here is our config for reference:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: scrape-al
spec:
schedule: '*/15 * * * *'
concurrencyPolicy: Allow
failedJobsHistoryLimit: 0
successfulJobsHistoryLimit: 0
jobTemplate:
metadata:
labels:
app: scrape
scrape: al
spec:
template:
spec:
containers:
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
command:
- /opt/openstates/openstates/pupa-scrape.sh
args:
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
Ideally we would prefer that a cron job would be terminated after N retries (e.g., something like kubectl delete cronjob my-cron-job
after my-cron-job
has failed 5 times). Any ideas or suggestions would be much appreciated. Thanks!
What does * mean in Cron? The asterisk * is used as a wildcard in Cron. * sets the execution of a task to any minute, hour, day, weekday, or month.
Cron tasks may fail due to resource depletion. These tasks may run out of disk space or a lack of available space may cause an operating system to be unable to start new threads. There are various ways to check how much disk space is used and what it's used by.
Create a some kind of scheduler where you can write your CRON job again if it fails, in this case you will need one more CRON job to read you scheduler and run proper command. Scheduler can be database / file / NoSQL based.
You can stop a single cron job by removing its line from the crontab file. To do that, run the crontab -e command and then delete the line for the specific task. Alternatively, you can stop the cron job by commenting it out in the crontab file.
You can tell your Job to stop retrying using backoffLimit
.
Specifies the number of retries before marking this job failed.
In your case
spec:
template:
spec:
containers:
- name: scrape-al
image: 'govhawk/openstates:1.3.1-beta'
command:
- /opt/openstates/openstates/pupa-scrape.sh
args:
- al bills --scrape
restartPolicy: Never
backoffLimit: 3
You set 3 asbackoffLimit
of your Job. That means when a Job is created by CronJob, It will retry 3 times if fails. This controls Job, not CronJob
When Job is failed, another Job will be created again as your scheduled period.
You want: If I am not wrong, you want to stop scheduling new Job, when your scheduled Jobs are failed for 5 times. Right?
Answer: In that case, this is not possible automatically.
Possible solution: You need to suspend CronJob so than it stop scheduling new Job.
Suspend: true
You can do this manually. If you do not want to do this manually, you need to setup a watcher, that will watch your CronJob status, and will update CronJob to suspend if necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With