The problem: Airflow's execution_date is defined as the beginning of the period between runs. For example, a DAG run on a weekly schedule would run on 2018-01-08 T11:00:00, but the execution_date would be 2018-01-01 T11:01:00.
The objective: I receive a file once a week, with the file date in the file's name. To identify the file, I'd like to use Airflow's execution_date. But I cannot seem to find a way to use the date of the run, versus using the earliest possible execution_date for a period.
Possible solutions:
execution_date on the fly. Something like: context['execution_date'] + timedelta(days=7). This seems hacky.ShortCircuitOperator at the beginning of the DAG execution graph, exit if execution_date is not the expected date.All suggestions or recommendations are welcomed. It's a nuanced problem, but causing some issues with my ETL pipeline.
Another possible solution?
I think using execution_date + timedelta(days=7) is a bit hacky, intead use the execution_date + schedule_interval, that way if the interval changes there shouldn't be any issues (I do this for one of my DAGS). If you're using a newer airflow version then you can use the next_execution_date which is better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With