I am looking to create a transform in Airflow, and I want to ensure to get all data from my source since the last time a DAG has run in order to update my target table. In order to this, I want to be able to get the most recent execution which was successful.
I have found this: Apache airflow macro to get last dag run execution time which gets me somewhere to the end goal, however, this only gets the last time the DAG executed, regardless of it being successful or not.
SELECT col1, col2, col3
FROM schema.table
WHERE table.updated_at > '{{ last_dag_run_execution_date(dag) }}';
If an execution fails (due to connectivity or something like), the last_dag_run_execution_date(dag) will update, but we've missed the execution for that previous DAG run.
Ideally, this will pull the most recent non-failed execution. Or if anyone has any ideas how I can meet this, please let me know
I've ended up changing the function in the referenced question to use the latest_execution_date, which is a predefined macro in Airflow, as such:
def get_last_dag_run(dag):
last_dag_run = dag.latest_execution_date
if last_dag_run is None:
return '2013-01-01'
else:
return last_dag_run
Seems to be working for me at the moment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With