Airflow scheduler stuck

Tags:

I'm testing the use of Airflow, and after triggering a (seemingly) large number of DAGs at the same time, it seems to just fail to schedule anything and starts killing processes. These are the logs the scheduler prints:

[2019-08-29 11:17:13,542] {scheduler_job.py:214} WARNING - Killing PID 199809
[2019-08-29 11:17:13,544] {scheduler_job.py:214} WARNING - Killing PID 199809
[2019-08-29 11:17:44,614] {scheduler_job.py:214} WARNING - Killing PID 2992
[2019-08-29 11:17:44,614] {scheduler_job.py:214} WARNING - Killing PID 2992
[2019-08-29 11:18:15,692] {scheduler_job.py:214} WARNING - Killing PID 5174
[2019-08-29 11:18:15,693] {scheduler_job.py:214} WARNING - Killing PID 5174
[2019-08-29 11:18:46,765] {scheduler_job.py:214} WARNING - Killing PID 22410
[2019-08-29 11:18:46,766] {scheduler_job.py:214} WARNING - Killing PID 22410
[2019-08-29 11:19:17,845] {scheduler_job.py:214} WARNING - Killing PID 42177
[2019-08-29 11:19:17,846] {scheduler_job.py:214} WARNING - Killing PID 42177
...

I'm using a LocalExecutor with a PostgreSQL backend DB. It seems to be happening only after I'm triggering a large number (>100) of DAGs at about the same time using external triggering. As in:

airflow trigger_dag DAG_NAME

After waiting for it to finish killing whatever processes he is killing, he starts executing all of the tasks properly. I don't even know what these processes were, as I can't really see them after they are killed...

Did anyone encounter this kind of behavior? Any idea why would that happen?

635

asked Aug 29 '19 15:08

GuD

2 Answers

The reason for the above in my case was that I had a DAG file creating a very large number of DAGs dynamically.

The "dagbag_import_timeout" config variable which controls "How long before timing out a python file import while filling the DagBag" was set to the default value of 30. Thus the process filling the DagBag kept timing out.

answered Sep 30 '22 09:09

GuD

I've had a very similar issue. My DAG was of the same nature (a file that generates many DAGs dynamically). I tried the suggested solution but it didn't work (had this value to some high already, 60 seconds, increased to 120 but my issue wasn't resolved).

Posting what worked for me in case someone else has a similar issue.

I came across this JIRA ticket: https://issues.apache.org/jira/browse/AIRFLOW-5506

which helped me resolve my issue: I disabled the SLA configuration, and then all my tasks started to run!

There can also be other solutions, as other comments in this ticket suggest.

For the record, my issue started to occur after I enabled lots of such DAGs (around 60?) that I had disabled for a few months. Not sure how the SLA affects this from technical perspective TBH, but it did.

answered Sep 30 '22 08:09

babis21

Related questions
                            
                                Move and transform data between databases using Airflow
                            
                                Sharing large intermediate state between Airflow tasks
                            
                                How do I disable Airflow login for authentication and authorization?
                            
                                Airflow: pattern to run airflow subdag once
                            
                                Airflow: Log file isn't local, Unsupported remote log location
                            
                                Airflow not scheduling Correctly Python
                            
                                Airflow webserver gives cron error for dags with None as schedule interval
                            
                                Airflow will keep showing example dags even after removing it from configuration
                            
                                Airflow latency between tasks
                            
                                Airflow DAG "seems to be existing only locally. The master scheduler doesn't seem to be aware of its existence"
                            
                                Airflow depends_on_past for whole DAG
                            
                                Pass parameters to Airflow Experimental REST api when creating dag run
                            
                                How do I check when my next Airflow DAG run has been scheduled for a specific dag?
                            
                                Kubernetes deployment read-only filesystem error
                            
                                How to delete a DAG run in Apache Airflow?
                            
                                how do I use the --conf option in airflow
                            
                                How to activate authentication in Apache Airflow
                            
                                airflow webserver starting - gunicorn workers shutting down
                            
                                How to resolve DB connection invalidated warning in Airflow Scheduler?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Airflow scheduler stuck

Tags:

airflow

airflow-scheduler

GuD

People also ask

2 Answers

GuD

babis21

Recent Activity

Donate For Us