I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like: <pre class="prettyprint"><code>(start) -> (do_work_for_product1) ├ -> (do_work_for_product2) ├ -> (do_work_for_product3) ├ … </code></pre> So the <code>start</code> task has multiple downstreams. And I setup concurrency related configuration as below: <pre class="prettyprint"><code>parallelism = 3 dag_concurrency = 3 max_active_runs = 1 </code></pre> Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others stuck at "queued" status. If I clear the task from Admin UI, it gets executed. There is no worker log (after processing some first downstreams, it just doesn't output any log). Web server's log (not sure <code>worker exiting</code> is related) <pre class="prettyprint"><code>/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead. .format(x=modname), ExtDeprecationWarning [2017-08-24 04:20:56,496] [51] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow_dags [2017-08-24 04:20:57 +0000] [27] [INFO] Handling signal: ttou [2017-08-24 04:20:57 +0000] [37] [INFO] Worker exiting (pid: 37) </code></pre> There is no error log on scheduler, too. And a number of tasks get stuck is changing whenever I try this. Because I also use Docker I'm wondering if this is related: https://github.com/puckel/docker-airflow/issues/94 But so far, no clue. Has anyone faced with a similar issue or have some idea what I can investigate for this issue...?

I have been working on the same docker image puckel. My issue was resolved by : Replacing <pre class="prettyprint"><code> result_backend = db+postgresql://airflow:airflow@postgres/airflow </code></pre> with <pre class="prettyprint"><code>celery_result_backend = db+postgresql://airflow:airflow@postgres/airflow </code></pre> which I think is updated in the latest pull by puckel. The change was reverted around in Feb 2018 and your comment was made in January.

Airflow tasks get stuck at "queued" status and never gets running

Tags:

I'm using Airflow v1.8.1 and run all components (worker, web, flower, scheduler) on kubernetes & Docker. I use Celery Executor with Redis and my tasks are looks like:

(start) -> (do_work_for_product1)      ├  -> (do_work_for_product2)      ├  -> (do_work_for_product3)      ├  …

So the start task has multiple downstreams. And I setup concurrency related configuration as below:

parallelism = 3 dag_concurrency = 3 max_active_runs = 1

Then when I run this DAG manually (not sure if it never happens on a scheduled task) , some downstreams get executed, but others stuck at "queued" status.

If I clear the task from Admin UI, it gets executed. There is no worker log (after processing some first downstreams, it just doesn't output any log).

Web server's log (not sure worker exiting is related)

/usr/local/lib/python2.7/dist-packages/flask/exthook.py:71: ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use flask_cache instead.   .format(x=modname), ExtDeprecationWarning [2017-08-24 04:20:56,496] [51] {models.py:168} INFO - Filling up the DagBag from /usr/local/airflow_dags [2017-08-24 04:20:57 +0000] [27] [INFO] Handling signal: ttou [2017-08-24 04:20:57 +0000] [37] [INFO] Worker exiting (pid: 37)

There is no error log on scheduler, too. And a number of tasks get stuck is changing whenever I try this.

Because I also use Docker I'm wondering if this is related: https://github.com/puckel/docker-airflow/issues/94 But so far, no clue.

Has anyone faced with a similar issue or have some idea what I can investigate for this issue...?

859

asked Aug 24 '17 04:08

Norio Akagi

2 Answers

Tasks getting stuck is, most likely, a bug. At the moment (<= 1.9.0alpha1) it can happen when a task cannot even start up on the (remote) worker. This happens for example in the case of an overloaded worker or missing dependencies.

This patch should resolve that issue.

It is worth investigating why your tasks do not get a RUNNING state. Setting itself to this state is first thing a task does. Normally the worker does log before it starts executing and it also reports and errors. You should be able to find entries of this in the task log.

edit: As was mentioned in the comments on the original question in case one example of airflow not being able to run a task is when it cannot write to required locations. This makes it unable to proceed and tasks would get stuck. The patch fixes this by failing the task from the scheduler.

answered Oct 19 '22 23:10

Bolke de Bruin

I have been working on the same docker image puckel. My issue was resolved by :

Replacing

 result_backend = db+postgresql://airflow:airflow@postgres/airflow

with

celery_result_backend = db+postgresql://airflow:airflow@postgres/airflow

which I think is updated in the latest pull by puckel. The change was reverted around in Feb 2018 and your comment was made in January.

answered Oct 19 '22 22:10

Rohan Sawant

Related questions
                            
                                How to install google maps through npm?
                            
                                Are Web Components actually useable in IE11 and Edge?
                            
                                Should I Unsubscribe reactive form's valueChanges?
                            
                                Link Android Room Database with Firebase Realtime Database
                            
                                Why do regex engines allow / automatically attempt matching at the end of the input string?
                            
                                Xcode Build Error Jenkins: Your session has expired. Please log in
                            
                                Timed out waiting for process to appear on device
                            
                                Performing a health check in .NET Core Worker Service
                            
                                Access a localhost running in Windows from inside WSL2?
                            
                                Is Clang or GCC correct in rejecting/accepting this CTAD code?
                            
                                implementing "update if exists" in Doctrine ORM
                            
                                Erlang compilation - Erlang as stand alone executeable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With