Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wait for kubernetes job to complete on either failure/success using command line

What is the best way to wait for kubernetes job to be complete? I noticed a lot of suggestions to use:

kubectl wait --for=condition=complete job/myjob

but i think that only works if the job is successful. if it fails, i have to do something like:

kubectl wait --for=condition=failed job/myjob

is there a way to wait for both conditions using wait? if not, what is the best way to wait for a job to either succeed or fail?

like image 368
ruazn2 Avatar asked Mar 09 '19 02:03

ruazn2


People also ask

How do you configure a Kubernetes job so that pods are retained after completion?

How do you configure a Kubernetes Job so that Pods are retained after completion? a) Configure the backofflimit parameter with a non-zero value. b) Set a startingDeadlineSeconds value high enough to allow you to access the logs. c) Configure the cascade flag for the Job with a value of false.

How do I know if my job is completed by Kubernetes?

To view completed Pods of a Job, use kubectl get pods . Here, the selector is the same as the selector for the Job. The --output=jsonpath option specifies an expression with the name from each Pod in the returned list.

How do you use kubectl wait?

Use Wait command We utilize the 'wait' command to recess until the pods meet the requirements. Use kubectl apply to relate the variations to the cluster and wait a randomly set amount of time (60 seconds) to check the status of the pod. At this point, we expect the fresh deployment to be active and the old one removed.


3 Answers

Run the first wait condition as a subprocess and capture its PID. If the condition is met, this process will exit with an exit code of 0.

kubectl wait --for=condition=complete job/myjob &
completion_pid=$!

Do the same for the failure wait condition. The trick here is to add && exit 1 so that the subprocess returns a non-zero exit code when the job fails.

kubectl wait --for=condition=failed job/myjob && exit 1 &
failure_pid=$!

Then use the Bash builtin wait -n $PID1 $PID2 to wait for one of the conditions to succeed. The command will capture the exit code of the first process to exit:

MAC USERS! Note that wait -n [...PID] requires Bash version 4.3 or higher. MacOS is forever stuck on version 3.2 due to license issues. Please see this Stackoverflow Post on how to install the latest version.

wait -n $completion_pid $failure_pid

Finally, you can check the actual exit code of wait -n to see whether the job failed or not:

exit_code=$?

if (( $exit_code == 0 )); then
  echo "Job completed"
else
  echo "Job failed with exit code ${exit_code}, exiting..."
fi

exit $exit_code

Complete example:

# wait for completion as background process - capture PID
kubectl wait --for=condition=complete job/myjob &
completion_pid=$!

# wait for failure as background process - capture PID
kubectl wait --for=condition=failed job/myjob && exit 1 &
failure_pid=$! 

# capture exit code of the first subprocess to exit
wait -n $completion_pid $failure_pid

# store exit code in variable
exit_code=$?

if (( $exit_code == 0 )); then
  echo "Job completed"
else
  echo "Job failed with exit code ${exit_code}, exiting..."
fi

exit $exit_code
like image 123
Sebastian N Avatar answered Oct 21 '22 23:10

Sebastian N


The wait -n approach does not work for me as I need it to work both on Linux and Mac.

I improved on the answer provided by Clayton a little, because his script would not work with set -e -E enabled. The following will work even in that case.

while true; do
  if kubectl wait --for=condition=complete --timeout=0 job/name 2>/dev/null; then
    job_result=0
    break
  fi

  if kubectl wait --for=condition=failed --timeout=0 job/name 2>/dev/null; then
    job_result=1
    break
  fi

  sleep 3
done

if [[ $job_result -eq 1 ]]; then
    echo "Job failed!"
    exit 1
fi

echo "Job succeeded"

You might want to add a timeout to avoid the infinite loop, depends on your situation.

like image 9
Martin Melka Avatar answered Oct 22 '22 00:10

Martin Melka


You can leverage the behaviour when --timeout=0.

In this scenario, the command line returns immediately with either result code 0 or 1. Here's an example:

retval_complete=1
retval_failed=1
while [[ $retval_complete -ne 0 ]] && [[ $retval_failed -ne 0 ]]; do
  sleep 5
  output=$(kubectl wait --for=condition=failed job/job-name --timeout=0 2>&1)
  retval_failed=$?
  output=$(kubectl wait --for=condition=complete job/job-name --timeout=0 2>&1)
  retval_complete=$?
done

if [ $retval_failed -eq 0 ]; then
    echo "Job failed. Please check logs."
    exit 1
fi

So when either condition=failed or condition=complete is true, execution will exit the while loop (retval_complete or retval_failed will be 0).

Next, you only need to check and act on the condition you want. In my case, I want to fail fast and stop execution when the job fails.

like image 9
Clayton Avatar answered Oct 22 '22 01:10

Clayton