Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow: changing the crontab time for a DAG in Airflow

Tags:

airflow

I have a DAG that has been running everyday at 3:00, it ran OK for the past few weeks.

I've updated the date to run now at 7:00, but apparently for the last 2 days it didn't run. I can see the tasks for those two days with the status 'running' (in green), but no command is triggered.

Does one needs to do something more to change de running time of a DAG ?

I know that in the past one way to solve this was to clean in the meta-database the tasks for this DAG, and update the start_date, but I would rather avoid doing this again.

Anyone has a suggestion?

like image 477
David Batista Avatar asked May 18 '16 08:05

David Batista


People also ask

How to schedule a DAG in airflow?

To schedule a dag, Airflow just looks for the last execution date and sum the schedule interval. If this time has expired it will run the dag. You cannot simple update the start date.

Is it possible to use cron expressions in airflow?

For the above example, I have used cron expression, but Airflow supports three types of expression actually: There are some traps of scheduling in Airflow. The most important one is the LAST RUN time of Airflow.

What are the data intervals in airflow?

In addition to the logical date, there is another concept introduced in Airflow 2.2, the Data Intervals. A data interval is nothing more than the time range for which a DAG operates in (it covers). All of those concepts can be summed up like this

How does the airflow scheduler work behind the scenes?

Behind the scenes, it monitors and stays in sync with a folder for all DAG objects it may contain, and periodically (every minute or so) inspects active tasks to see whether they can be triggered. The Airflow scheduler is designed to run as a persistent service in an Airflow production environment.


1 Answers

To schedule a dag, Airflow just looks for the last execution date and sum the schedule interval. If this time has expired it will run the dag. You cannot simple update the start date. A simple way to do this is edit your start date and schedule interval, rename your dag (e.g. xxxx_v2.py) and redeploy it.

like image 174
p.magalhaes Avatar answered Nov 22 '22 02:11

p.magalhaes