Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the date of the most recent successful DAG execution

Tags:

I am looking to create a transform in Airflow, and I want to ensure to get all data from my source since the last time a DAG has run in order to update my target table. In order to this, I want to be able to get the most recent execution which was successful.

I have found this: Apache airflow macro to get last dag run execution time which gets me somewhere to the end goal, however, this only gets the last time the DAG executed, regardless of it being successful or not.

SELECT col1, col2, col3
FROM schema.table
WHERE table.updated_at > '{{ last_dag_run_execution_date(dag) }}';

If an execution fails (due to connectivity or something like), the last_dag_run_execution_date(dag) will update, but we've missed the execution for that previous DAG run.

Ideally, this will pull the most recent non-failed execution. Or if anyone has any ideas how I can meet this, please let me know

like image 693
George Cooper-Pearson Avatar asked Aug 22 '19 10:08

George Cooper-Pearson


1 Answers

I've ended up changing the function in the referenced question to use the latest_execution_date, which is a predefined macro in Airflow, as such:

def get_last_dag_run(dag):
    last_dag_run = dag.latest_execution_date
    if last_dag_run is None: 
        return '2013-01-01'
    else:
        return last_dag_run

Seems to be working for me at the moment.

like image 105
George Cooper-Pearson Avatar answered Nov 15 '22 04:11

George Cooper-Pearson