Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes Cronjob Only Runs Half the Time

Tags:

kubernetes

I want a job to trigger every 15 minutes but it is consistently triggering every 30 minutes.

UPDATE:

I've simplified the problem by just running:

kubectl run hello --schedule="*/1 * * * *" --restart=OnFailure --image=busybox -- /bin/sh -c "date; echo Hello from the Kubernetes cluster"

As specified in the docs here: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/

and yet the job still refuses to run on time.

$ kubectl get cronjobs
NAME               SCHEDULE      SUSPEND   ACTIVE    LAST SCHEDULE   AGE
hello              */1 * * * *   False     1         5m              30m
hello2             */1 * * * *   False     1         5m              12m

It took 25 minutes for the command line created cronjob to run and 7 minutes for the cronjob created from yaml. They were both finally scheduled at the same time so it's almost like etcd finally woke up and did something?

ORIGINAL ISSUE:

When I drill into an active job I see Status: Terminated: Completed but Age: 25 minutes or something greater than 15.

In the logs I see that the python script meant to run has completed it's final print statement. The script takes about ~2min to complete based on it's output file in s3. Then no new job is scheduled for 28 more minutes.

I have tried with different configurations:

Schedule: */15 * * * * AND Schedule: 0,15,30,45 * * * *

As well as

Concurrency Policy: Forbid AND Concurrency Policy: Replace

What else could be going wrong here?

Full config with identifying lines modified:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  labels:
    type: f-c
  name: f-c-p
  namespace: extract
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
          labels:
            type: f-c
        spec:
          containers:
          - args:
            - /f_c.sh
            image: identifier.amazonaws.com/extract_transform:latest
            imagePullPolicy: Always
            env:
            - name: ENV
              value: prod
            - name: SLACK_TOKEN
              valueFrom:
                secretKeyRef:
                  key: slack_token
                  name: api-tokens
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  key: aws_access_key_id
                  name: api-tokens
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  key: aws_secret_access_key
                  name: api-tokens
            - name: F_ACCESS_TOKEN
              valueFrom:
                secretKeyRef:
                  key: f_access_token
                  name: api-tokens
            name: s-f-c
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: '*/15 * * * *'
  successfulJobsHistoryLimit: 1
  suspend: false
status: {}
like image 599
ProGirlXOXO Avatar asked May 08 '18 01:05

ProGirlXOXO


1 Answers

After running these jobs in a test cluster I discovered that external circumstances prevented them from running as intended.

On the original cluster there were ~20k scheduled jobs. The built-in scheduler for Kubernetes is not yet capable of handling this volume consistently.

The maximum number of jobs that can be reliably run (within a minute of the time intended) may depend on the size of your master nodes.

like image 181
ProGirlXOXO Avatar answered Oct 11 '22 14:10

ProGirlXOXO