Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to monitor kube cron jobs using prometheus

Tags:

Is there a way to monitor kube cronjob.

I have a kube cronjob which runs every 10mins on my cluster.. is there a way to collect metrics everytime my cronjob fails due to some error or notify when my cronjob has not been completed after a certain period of time.

like image 618
user3587892 Avatar asked Nov 17 '17 05:11

user3587892


People also ask

How do I monitor cron jobs with Grafana?

System cron monitoring Connect these VMs to the Prometheus Pod. On VMs, configure node exporters to send system cron status metrics to Prometheus. In Prometheus, use these metrics to set up trigger rules for alerts. In a Grafana Dashboard, display the latest status of each OpenShift CronJob and each system cron.

What are Kubernetes CronJobs?

A CronJob creates Jobs on a repeating schedule. One CronJob object is like one line of a crontab (cron table) file. It runs a job periodically on a given schedule, written in Cron format.


2 Answers

I'm using these rules with kube-state-metrics:

groups:
- name: job.rules
  rules:
  - alert: CronJobRunning
    expr: time() -kube_cronjob_next_schedule_time > 3600
    for: 1h
    labels:
      severity: warning
    annotations:
      description: CronJob {{$labels.namespaces}}/{{$labels.cronjob}} is taking more than 1h to complete
      summary: CronJob didn't finish after 1h

  - alert: JobCompletion
    expr: kube_job_spec_completions - kube_job_status_succeeded  > 0
    for: 1h
    labels:
      severity: warning
    annotations:
      description: Job completion is taking more than 1h to complete
        cronjob {{$labels.namespaces}}/{{$labels.job}}
      summary: Job {{$labels.job}} didn't finish to complete after 1h

  - alert: JobFailed
    expr: kube_job_status_failed  > 0
    for: 1h
    labels:
      severity: warning
    annotations:
      description: Job {{$labels.namespaces}}/{{$labels.job}} failed to complete
      summary: Job failed
like image 164
Camil Avatar answered Oct 15 '22 07:10

Camil


The tricky part here is the cronjobs themselves have no useful status, you have to match them to the jobs they create. I've written up an article on how to achieve this:

https://medium.com/@tristan_96324/prometheus-k8s-cronjob-alerts-94bee7b90511

The article goes into a bit of detail as to how things work, but the alert config is as follow:

groups:
- name: kube-cron
  rules:
  - record: job_cronjob:kube_job_status_start_time:max
    expr: |
      label_replace(
        label_replace(
          max(
            kube_job_status_start_time
            * ON(exported_job) GROUP_RIGHT()
            kube_job_labels{label_cronjob!=""}
          ) BY (exported_job, label_cronjob)
          == ON(label_cronjob) GROUP_LEFT()
          max(
            kube_job_status_start_time
            * ON(exported_job) GROUP_RIGHT()
            kube_job_labels{label_cronjob!=""}
          ) BY (label_cronjob),
          "job", "$1", "exported_job", "(.+)"),
        "cronjob", "$1", "label_cronjob", "(.+)")

  - record: job_cronjob:kube_job_status_failed:sum
    expr: |
  clamp_max(
        job_cronjob:kube_job_status_start_time:max,
      1)
      * ON(job) GROUP_LEFT()
      label_replace(
        label_replace(
          (kube_job_status_failed != 0),
          "job", "$1", "exported_job", "(.+)"),
        "cronjob", "$1", "label_cronjob", "(.+)")


  - alert: CronJobStatusFailed
    expr: |
      job_cronjob:kube_job_status_failed:sum
      * ON(cronjob) GROUP_RIGHT()
      kube_cronjob_labels
      > 0
    for: 1m
    annotations:
      description: '{{ $labels.cronjob }} last run has failed {{$value }} times.'

The jobTemplate must include a label called cronjob that matches the name of the cronjob object.

like image 32
Tristan Colgate Avatar answered Oct 15 '22 06:10

Tristan Colgate