Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow 1.9.0 is queuing but not launching tasks

Airflow is randomly not running queued tasks some tasks dont even get queued status. I keep seeing below in the scheduler logs

 [2018-02-28 02:24:58,780] {jobs.py:1077} INFO - No tasks to consider for execution. 

I do see tasks in database that either have no status or queued status but they never get started.

The airflow setup is running https://github.com/puckel/docker-airflow on ECS with Redis. There are 4 scheduler threads and 4 Celery worker tasks. For the tasks that are not running are showing in queued state (grey icon) when hovering over the task icon operator is null and task details says:

    All dependencies are met but the task instance is not running. In most cases this just means that the task will probably be scheduled soon unless:- The scheduler is down or under heavy load 

Metrics on scheduler do not show heavy load. The dag is very simple with 2 independent tasks only dependent on last run. There are also tasks in the same dag that are stuck with no status (white icon).

Interesting thing to notice is when I restart the scheduler tasks change to running state.

like image 264
l0n3r4n83r Avatar asked Feb 28 '18 02:02

l0n3r4n83r


People also ask

How do I know if my scheduler is running Airflow?

CLI Check for Scheduler BaseJob with information about the host and timestamp (heartbeat) at startup, and then updates it regularly. You can use this to check if the scheduler is working correctly. To do this, you can use the airflow jobs checks command. On failure, the command will exit with a non-zero error code.

How do I get task status in Airflow?

Start by grabbing the task_ids and state of the task you're interested in with a db call. That should give you the state (and name, for reference) of the task you're trying to monitor. State is stored as a simple lowercase string.


2 Answers

Airflow can be a bit tricky to setup.

  • Do you have the airflow scheduler running?
  • Do you have the airflow webserver running?
  • Have you checked that all DAGs you want to run are set to On in the web ui?
  • Do all the DAGs you want to run have a start date which is in the past?
  • Do all the DAGs you want to run have a proper schedule which is shown in the web ui?
  • If nothing else works, you can use the web ui to click on the dag, then on Graph View. Now select the first task and click on Task Instance. In the paragraph Task Instance Details you will see why a DAG is waiting or not running.

I've had for instance a DAG which was wrongly set to depends_on_past: True which forbid the current instance to start correctly.

Also a great resource directly in the docs, which has a few more hints: Why isn't my task getting scheduled?.

like image 50
tobi6 Avatar answered Sep 30 '22 02:09

tobi6


I'm running a fork of the puckel/docker-airflow repo as well, mostly on Airflow 1.8 for about a year with 10M+ task instances. I think the issue persists in 1.9, but I'm not positive.

For whatever reason, there seems to be a long-standing issue with the Airflow scheduler where performance degrades over time. I've reviewed the scheduler code, but I'm still unclear on what exactly happens differently on a fresh start to kick it back into scheduling normally. One major difference is that scheduled and queued task states are rebuilt.

Scheduler Basics in the Airflow wiki provides a concise reference on how the scheduler works and its various states.

Most people solve the scheduler diminishing throughput problem by restarting the scheduler regularly. I've found success at a 1-hour interval personally, but have seen as frequently as every 5-10 minutes used too. Your task volume, task duration, and parallelism settings are worth considering when experimenting with a restart interval.

For more info see:

  • Airflow: Tips, Tricks, and Pitfalls (section "The scheduler should be restarted frequently")
  • Bug 1286825 - Airflow scheduler stopped working silently
  • Airflow at WePay (section "Restart everything when deploying DAG changes.")

This used to be addressed by restarting every X runs using the SCHEDULER_RUNS config setting, although that setting was recently removed from the default systemd scripts.

You might also consider posting to the Airflow dev mailing list. I know this has been discussed there a few times and one of the core contributors may be able to provide additional context.

Related Questions

  • Airflow tasks get stuck at "queued" status and never gets running (especially see Bolke's answer here)
  • Jobs not executing via Airflow that runs celery with RabbitMQ
like image 27
Taylor D. Edmiston Avatar answered Sep 30 '22 03:09

Taylor D. Edmiston