Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the eloquent way to use the run date for a weekly Airflow job?

The problem: Airflow's execution_date is defined as the beginning of the period between runs. For example, a DAG run on a weekly schedule would run on 2018-01-08 T11:00:00, but the execution_date would be 2018-01-01 T11:01:00.

The objective: I receive a file once a week, with the file date in the file's name. To identify the file, I'd like to use Airflow's execution_date. But I cannot seem to find a way to use the date of the run, versus using the earliest possible execution_date for a period.

Possible solutions:

  • Modify the execution_date on the fly. Something like: context['execution_date'] + timedelta(days=7). This seems hacky.
  • Run the DAG daily, insert a ShortCircuitOperator at the beginning of the DAG execution graph, exit if execution_date is not the expected date.

All suggestions or recommendations are welcomed. It's a nuanced problem, but causing some issues with my ETL pipeline.

like image 411
root Avatar asked Jan 21 '26 04:01

root


1 Answers

Another possible solution?

  • Have the DAG run once a week just after you "think" the file will arrive. Parse the names of the files in the landing area which will give you a bunch of dates. Check and see which of these dates is between the execution_date + schedule_interval (or next_execution_date if you're using airflow version >= 1.8). Then ingest file/s which match.

I think using execution_date + timedelta(days=7) is a bit hacky, intead use the execution_date + schedule_interval, that way if the interval changes there shouldn't be any issues (I do this for one of my DAGS). If you're using a newer airflow version then you can use the next_execution_date which is better.

like image 124
Simon D Avatar answered Jan 23 '26 20:01

Simon D



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!