Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rename a DAG in Apache Airflow

Tags:

airflow

I've been assessing Airflow the last few days as a possible replacement tool for our ETL workflows and found some interesting behaviour when a DAG is renamed in Airflow.

If I have a dag in a file called hello_world.py

dag = DAG('hello_world', description='Simple DAG',
          schedule_interval='0 12 * * *',
          start_date=datetime(2017, 11, 1), catchup=True)

And this dag has been executed for 10 days in November, I then decide that I simply want to change the name of the dag to 'yet_another_hello_world' e.g. in the same file hello_world.py

dag = DAG('yet_another_hello_world', description='Simple DAG',
          schedule_interval='0 12 * * *',
          start_date=datetime(2017, 11, 1), catchup=True)

Im simply doing a rename of the job, not change to the business logic etc. When this is deployed into Airflow, it is automatically picked up and registered as a new job, so there are now 2 jobs visible in the DAG view

  • hello_world
  • yet_another_hello_world

Becuse of catchup=True in the DAG definition, the scheduler automatically see's this change and registeres a new job yet_another_hello_world it then continues to backfill the missing executions from the 1st of November. It also continues to leave the existing hello_world job intact.

Ultimately, I want this to be a rename of the existing job and not preserve the old hello_world job. Is there a way to indicate to airflow that this is a simple rename?

like image 683
vcetinick Avatar asked Dec 14 '17 00:12

vcetinick


1 Answers

As a best practice, it is always recommended to create a new dag file when you want to change your dags' name, schedule_interval or start_date.

like image 184
sdikby Avatar answered Dec 08 '22 05:12

sdikby