I have a DAG that has been running everyday at 3:00, it ran OK for the past few weeks.
I've updated the date to run now at 7:00, but apparently for the last 2 days it didn't run. I can see the tasks for those two days with the status 'running' (in green), but no command is triggered.
Does one needs to do something more to change de running time of a DAG ?
I know that in the past one way to solve this was to clean in the meta-database the tasks for this DAG, and update the start_date, but I would rather avoid doing this again.
Anyone has a suggestion?
To schedule a dag, Airflow just looks for the last execution date and sum the schedule interval. If this time has expired it will run the dag. You cannot simple update the start date.
For the above example, I have used cron expression, but Airflow supports three types of expression actually: There are some traps of scheduling in Airflow. The most important one is the LAST RUN time of Airflow.
In addition to the logical date, there is another concept introduced in Airflow 2.2, the Data Intervals. A data interval is nothing more than the time range for which a DAG operates in (it covers). All of those concepts can be summed up like this
Behind the scenes, it monitors and stays in sync with a folder for all DAG objects it may contain, and periodically (every minute or so) inspects active tasks to see whether they can be triggered. The Airflow scheduler is designed to run as a persistent service in an Airflow production environment.
To schedule a dag, Airflow just looks for the last execution date
and sum the schedule interval
. If this time has expired it will run the dag. You cannot simple update the start date.
A simple way to do this is edit your start date
and schedule interval
, rename your dag (e.g. xxxx_v2.py) and redeploy it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With