My dag takes about 50seconds to parse, I only use external triggers to start dag runs, no schedules. I notice airflow wants to fill the dagbag a lot --> On every trigger_dag command AND in the background it keeps checking the dags folder AND creating .pyc files seemingly instantly once new .py deployed.
Is there anyway I can deploy my cluster and get dags filled once! Then for the next 2 weeks get dagruns starting instantly on any trigger_dag (right now takes 50 seconds just to fill the dagbag before starting). I have no need to update dag definitions within the 2 weeks.
When creating a new DAG, you probably want to set a global start_date for your tasks. This can be done by declaring your start_date directly in the DAG() object. The first DagRun to be created will be based on the min(start_date) for all your tasks.
Airflow scans the dags_folder for new DAGs every dag_dir_list_interval , which defaults to 5 minutes but can be modified. You might have to wait until this interval has passed before a new DAG appears in the UI.
According to the official Airflow docs, The task instances directly upstream from the task need to be in a success state. Also, if you have set depends_on_past=True, the previous task instance needs to have succeeded (except if it is the first run for that task).
concurrency : This is the maximum number of task instances allowed to run concurrently across all active DAG runs for a given DAG. This allows you to set 1 DAG to be able to run 32 tasks at once, while another DAG might only be able to run 16 tasks at once.
50 seconds is an incredibly huge amount of time for DAG instantiation. Looks like you are using a big piece of code (or just long-working) in your DAG file. It is very bad practice:
Note: This means all top level code (ie. anything that isn't defining the DAG) in a DAG file will get run each scheduler heartbeat. Try to avoid top level code to your DAG file unless absolutely necessary.
Airflow works exactly as you described. It is why you should treat your Python files in your DAG folder mostly as configuration files (with some programmatical capabilities). You can't change it with any magic config keys or something like it. This behaviour is the core of Airflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With