Are there any best practices that are followed for deploying new dags to airflow?
I saw a couple of comments on the google forum stating that the dags are saved inside a GIT repository and the same is synced periodically to the local location in the airflow cluster.
Regarding this approach, I had a couple of questions
Any help here is highly appreciated. Let me know in case you need any further details?
Here is how we manage it for our team.
First in terms of naming convention, each of our DAG file name matches the DAG Id from the content of the DAG itself (including the DAG version). This is useful because ultimately it's the DAG Id that you see in the Airflow UI so you will know exactly which file has been used behind each DAG.
Example for a DAG like this:
from airflow import DAG from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2017,12,05,23,59), 'email': ['[email protected]'], 'email_on_failure': True } dag = DAG( 'my_nice_dag-v1.0.9', #update version whenever you change something default_args=default_args, schedule_interval="0,15,30,45 * * * *", dagrun_timeout=timedelta(hours=24), max_active_runs=1) [...]
The name of the DAG file would be: my_nice_dag-v1.0.9.py
Benefits
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With