Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure Airflow dag to run at specific time on daily basis?

Tags:

python

airflow

How to configure the Airflow dag to execute at specified time on daily basis no matter what happens, something exactly like crons.

I know that similar behaviour could be obtained using TimeSensor, but in this case it depends upon the sensor tasks and which might conflict with the dag execution time.

Example: With sensor approach if I have sensor to run at 0 hour 15th minutes but if dag is executed at later then my task is delayed, so even for the sensor approach I need to make sure that the Dag is executed on right time.

So how to make sure that Dag is executed at specified time?

like image 227
samarth Avatar asked Feb 27 '16 10:02

samarth


People also ask

How do I change the schedule time in Airflow DAG?

To schedule a dag, Airflow just looks for the last execution date and sum the schedule interval . If this time has expired it will run the dag. You cannot simple update the start date. A simple way to do this is edit your start date and schedule interval , rename your dag (e.g. xxxx_v2.py) and redeploy it.

How do I schedule an Airflow job?

To start the airflow job scheduler you need to execute the Airflow Scheduler command. It will use the configuration specified in airflow. cfg. The Airflow Jobs Scheduler runs jobs with schedule_interval AFTER the start date, at the END of the period.

How do I create a dynamic DAG in Airflow?

One method for dynamically generating DAGs is to have a single Python file which generates DAGs based on some input parameter(s) (e.g. a list of APIs or tables). A common use case for this is an ETL or ELT-type pipeline where there are many data sources or destinations.

What is the default trigger rule in Airflow?

The default value for trigger_rule is all_success and can be defined as “trigger this task when all directly upstream tasks have succeeded”.


1 Answers

To start a DAG for example everyday on 2:30 AM in the morning you can do the following:

DAG(
   dag_id='dag_id',
   # start date:28-03-2017
   start_date= datetime(year=2017, month=3, day=28),
   # run this dag at 2 hours 30 min interval from 00:00 28-03-2017
   schedule_interval='30 2 * * *')

Before configuring the schedule the interpretation of the cron interval can verified and tested here: https://crontab.guru/

like image 105
javed Avatar answered Sep 22 '22 05:09

javed