Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to run Airflow Tasks due to execution date and start date

Tags:

Whenever I try to run a DAG, it will be in the running state but the tasks will not run. I have set my start date to datetime.today() and my schedule interval to "* * * * *". Manually triggering a run will start the dag but the task will not run due to:

The execution date is 2017-09-13T00:00:00 but this is before the task's start date 2017-09-13T16:20:30.363268.

I have tried various combinations of schedule intervals (such as a specific time each day) as well as waiting for the dag to be triggered and manual triggers. Nothing seems to work.

like image 817
Branko Avatar asked Sep 13 '17 20:09

Branko


People also ask

What is execution date in Airflow?

So its execution date is also in the day it is triggered, because it is scheduled at minute 50 for each hour. In airflow @hourly corresponds to 0 * * * *. Its schedule also similar. It is triggered at minute 0 for each hour, but in the doc its execution date is given as 2016-01-01.

What is start date in Airflow DAG?

The start date is the date at which your DAG starts being scheduled. This date can be in the past or in the future. Think of the start date as the start of the data interval you want to process. For example, the 01/01/2021 00:00. In addition to the start date, you need a schedule interval.

Is Start_date mandatory in Airflow DAG?

When creating a new DAG, you probably want to set a global start_date for your tasks. This can be done by declaring your start_date directly in the DAG() object. The first DagRun to be created will be based on the min(start_date) for all your tasks.


1 Answers

First of all start_date is a task attribute; but in general, it is set in default_args and used like dag attribute.

The message is very clear, if a task's execution_date is before the task's start_date, it can not be scheduled. You can set start_date smaller value:

import datetime

default_args = {
    'start_date': datetime.datetime(2019, 1, 1)  # hard coded date
}

or

import airflow

default_args = {
    'start_date': airflow.utils.dates.days_ago(7)  # 7 days ago
}

From Airflow Documentation

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

Let’s Repeat That The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.

So, when you schedule your dag, any dag_run's execution_date will be smaller than its start time. For daily, there will be 24 hours difference.

We can say start time = execution_date + schedule_interval
(start time is not start_date, it is just the start time of the dag run)

like image 57
mustafagok Avatar answered Feb 07 '23 01:02

mustafagok