Whenever I try to run a DAG, it will be in the running state but the tasks will not run. I have set my start date to datetime.today() and my schedule interval to "* * * * *". Manually triggering a run will start the dag but the task will not run due to:
The execution date is 2017-09-13T00:00:00 but this is before the task's start date 2017-09-13T16:20:30.363268.
I have tried various combinations of schedule intervals (such as a specific time each day) as well as waiting for the dag to be triggered and manual triggers. Nothing seems to work.
So its execution date is also in the day it is triggered, because it is scheduled at minute 50 for each hour. In airflow @hourly corresponds to 0 * * * *. Its schedule also similar. It is triggered at minute 0 for each hour, but in the doc its execution date is given as 2016-01-01.
The start date is the date at which your DAG starts being scheduled. This date can be in the past or in the future. Think of the start date as the start of the data interval you want to process. For example, the 01/01/2021 00:00. In addition to the start date, you need a schedule interval.
When creating a new DAG, you probably want to set a global start_date for your tasks. This can be done by declaring your start_date directly in the DAG() object. The first DagRun to be created will be based on the min(start_date) for all your tasks.
First of all start_date
is a task attribute; but in general, it is set in default_args
and used like dag attribute.
The message is very clear, if a task's execution_date
is before the task's start_date
, it can not be scheduled. You can set start_date
smaller value:
import datetime
default_args = {
'start_date': datetime.datetime(2019, 1, 1) # hard coded date
}
or
import airflow
default_args = {
'start_date': airflow.utils.dates.days_ago(7) # 7 days ago
}
From Airflow Documentation
Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.
Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
So, when you schedule your dag, any dag_run's execution_date
will be smaller than its start time. For daily, there will be 24 hours difference.
We can say start time = execution_date
+ schedule_interval
(start time is not start_date
, it is just the start time of the dag run)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With