Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Submit and monitor SLURM jobs using Apache Airflow

I am using the Slurm job scheduler to run my jobs on a cluster. What is the most efficient way to submit the Slurm jobs and check on their status using Apache Airflow?

I was able to use a SSHOperator to submit my jobs remotely and check on their status every minute until it is completed but I wonder if anyone knows a better way. Below is the SSHOperator I wrote.

sshHook = SSHHook(ssh_conn_id='my_conn_id',keepalive_interval=240)

task_ssh_bash = """
cd ~/projects &&
JID=$(sbatch myjob.sh)
echo $JID
sleep 10s # needed
ST="PENDING"
while [ "$ST" != "COMPLETED" ] ; do 
   ST=$(sacct -j ${JID##* } -o State | awk 'FNR == 3 {print $1}')
   sleep 1m
   if [ "$ST" == "FAILED" ]; then
      echo 'Job final status:' $ST, exiting...
      exit 122
   fi
echo $ST
"""

task_ssh = SSHOperator(
    task_id='test_ssh_operator',
    ssh_hook=sshHook,
    do_xcom_push=True,
    command=task_ssh_bash,
    dag=dag)
like image 267
stardust Avatar asked May 27 '19 08:05

stardust


1 Answers

I can't give a demonstrateable example but my inclination would be to implement an airflow sensor on top of something like pyslurm. Funnily enough I came across your question merely when looking to see if anyone has already done this!

EDIT: There is an interesting topic on regarding the use of excecutors for submitting jobs too

Best of luck

like image 189
JimCircadian Avatar answered Oct 17 '22 00:10

JimCircadian