Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AirFlow DAG Get stuck in running state

Tags:

airflow

I created a dag and scheduled it on a daily basis. It gets queued every day but tasks don't actually run. This problem already raised in the past here but the answers didn't help me so it seems there is another problem.

My code is shared below. I replaced the SQL of task t2 with a comment. Each one of the tasks runs successfully when I run them separately on CLI using "airflow test...".

Can you explain what should be done to make the DAG run? Thanks!

This is the DAG code:

from datetime import timedelta, datetime
from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator



default_args = {
    'owner' : 'me',
    'depends_on_past' : 'true',
    'start_date' : datetime(2018, 06, 25),
    'email' : ['[email protected]'],
    'email_on_failure':True,
    'email_on_retry':False,
    'retries' : 2,
    'retry_delay' : timedelta(minutes=5)
}


dag = DAG('my_agg_table',
default_args = default_args,
schedule_interval = "30 4 * * *"
)



t1 = BigQueryOperator(
    task_id='bq_delete_my_agg_table',
    use_legacy_sql=False,
    write_disposition='WRITE_TRUNCATE',
    allow_large_results=True,
    bql='''
    delete `my_project.agg.my_agg_table`
    where date = '{{ macros.ds_add(ds, -1)}}'
    ''',
    dag=dag)

t2 = BigQueryOperator(
    task_id='bq_insert_my_agg_table',
    use_legacy_sql=False,
    write_disposition='WRITE_APPEND',
    allow_large_results=True,
    bql='''
    #standardSQL
    Select ... the query continue here.....
    ''',    destination_dataset_table='my_project.agg.my_agg_table',
    dag=dag)


t1 >> t2
like image 794
Saar Porat Avatar asked Jul 10 '18 09:07

Saar Porat


1 Answers

It is usually very easy to find out about the reason why a task is not being run. When in the Airflow web UI:

  • select any DAG of interest
  • now click on the task
  • again, click on Task Instance Details
  • In the first row there is a panel Task Instance State
  • In the box Reason next to it is the reason why a task is being run - or why a task is being ignored

It usually makes sense to check the first task which is not being executed since I saw you have setup depends_on_past=True which can lead to problems if used in a wrong scenario.

More on that here: Airflow 1.9.0 is queuing but not launching tasks

like image 142
tobi6 Avatar answered Sep 28 '22 18:09

tobi6