Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow schedule_interval and start_date to get it to always fire the next interval

Tags:

cron

airflow

mwaa

How can I configure airflow (mwaa) so that it will fire at the same time (6am PST) every day regards of when the dag is deployed?

I have tried what makes sense to me:

  1. set the schedule_interval to 0 6 * * *.
  2. set the start date to:
now = datetime.utcnow()
now = now.replace(tzinfo=pendulum.timezone('America/Los_Angeles'))
previous_five_am = now.replace(hour = 5, minute = 0, second = 0, microsecond = 0)
start_date = previous_five_am

It seems that whenever I deploy by setting the start_date to 5am the previous day it would always fire at the next 6am no matter what time I deploy the dag or do a airflow update

like image 914
cosbor11 Avatar asked Sep 19 '25 20:09

cosbor11


1 Answers

Your confusion may be because you expect Airflow to schedule DAGs like cronjob when it's not. The first DAG Run is created based on the minimum start_date for the tasks in your DAG. Subsequent DAG Runs are created by the scheduler process, based on your DAG’s schedule_interval, sequentially. Airflow schedule tasks at the END of the interval (See docs) you can view this answer for examples.

As for your sample code - never set your start_date to be dynamic. It's a bad practice that can sometimes lead to DAG never being executed because now() always moves to now() + interval may never be reached see Airflow FAQ.

like image 119
Elad Kalif Avatar answered Sep 23 '25 10:09

Elad Kalif