As you can see in the image : airflow is making too much time between tasks execution ?
it almost represents 30% of the DAG execution time.
I've changed the airflow.cfg
file to:
job_heartbeat_sec = 1
scheduler_heartbeat_sec = 1
but I still have the same latency rate.
Why does it behave this way ?
Thirty seconds is fairly high for inter-task latency. In well-tuned environments I've seen, ~4-6 seconds between a task and a dependent task has been a fairly reasonable lower bound, even for environments with many thousands of DAGs.
As you've already stated, increasing the scheduler heartbeat (scheduler_heartbeat_sec
) and the number of threads the scheduler has (scheduler.max_threads
) are the best to decrease scheduling delays. If your tasks are blocked on other conditions (which you can check in logs; core.logging_level = DEBUG
for even more information), then you should resolve those first.
If you've adjusted both the scheduler heartbeat and the number of worker threads and you still see high scheduling delays, then you may need to consider using a more powerful machine.
It is by design. For instance I use Airflow to perform large workflows where some tasks can take a really long time. Airflow is not meant for tasks that will take seconds to execute, it can be used for that of course but might not be the most suitable tool.
With that said there is not much that you can do since you already found out the key settings to configure.
Additionally you might want to try to increase the number of threads of the scheduler:
[scheduler]
max_threads = 4
This can alternatively be done by setting the environment variable:
AIRFLOW__SCHEDULER__MAX_THREADS=4
However do not count on the latency to decrease that much.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With