I've been assessing Airflow the last few days as a possible replacement tool for our ETL workflows and found some interesting behaviour when a DAG is renamed in Airflow.
If I have a dag in a file called hello_world.py
dag = DAG('hello_world', description='Simple DAG',
schedule_interval='0 12 * * *',
start_date=datetime(2017, 11, 1), catchup=True)
And this dag has been executed for 10 days in November, I then decide that I simply want to change the name of the dag to 'yet_another_hello_world' e.g. in the same file hello_world.py
dag = DAG('yet_another_hello_world', description='Simple DAG',
schedule_interval='0 12 * * *',
start_date=datetime(2017, 11, 1), catchup=True)
Im simply doing a rename of the job, not change to the business logic etc. When this is deployed into Airflow, it is automatically picked up and registered as a new job, so there are now 2 jobs visible in the DAG view
Becuse of catchup=True in the DAG definition, the scheduler automatically see's this change and registeres a new job yet_another_hello_world it then continues to backfill the missing executions from the 1st of November. It also continues to leave the existing hello_world job intact.
Ultimately, I want this to be a rename of the existing job and not preserve the old hello_world job. Is there a way to indicate to airflow that this is a simple rename?
As a best practice, it is always recommended to create a new dag file when you want to change your dags' name, schedule_interval or start_date.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With