Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Example DAG gets stuck in "running" state indefinitely

Tags:

airflow

In my first foray into airflow, I am trying to run one of the example DAGS that comes with the installation. This is v.1.8.0. Here are my steps:

$ airflow trigger_dag example_bash_operator
[2017-04-19 15:32:38,391] {__init__.py:57} INFO - Using executor SequentialExecutor
[2017-04-19 15:32:38,676] {models.py:167} INFO - Filling up the DagBag from /Users/gbenison/software/kludge/airflow/dags
[2017-04-19 15:32:38,947] {cli.py:185} INFO - Created <DagRun example_bash_operator @ 2017-04-19 15:32:38: manual__2017-04-19T15:32:38, externally triggered: True>
$ airflow dag_state example_bash_operator '2017-04-19 15:32:38'
[2017-04-19 15:33:12,918] {__init__.py:57} INFO - Using executor SequentialExecutor
[2017-04-19 15:33:13,229] {models.py:167} INFO - Filling up the DagBag from /Users/gbenison/software/kludge/airflow/dags
running

The dag state remains "running" for a long time (at least 20 minutes by now), although from a quick inspection of this task it should take a matter of seconds. How can I troubleshoot this? How can I see which step it is stuck on?

like image 388
gcbenison Avatar asked Apr 19 '17 22:04

gcbenison


People also ask

How do I stop my DAG from running in Airflow?

Please notice that if the DAG is currently running, the Airflow scheduler will start again the tasks you delete. So either you stop the DAG first by changing its state or stop the scheduler (if you are running on a test environment).

How do you run a DAG every 5 minutes?

schedule_interval is defined as a DAG arguments, and receives preferably a cron expression as a str, or a datetime. timedelta object. When following the provided link for CRON expressions it appears you can specify it as */5 * * * * to run it every 5 minutes.

What parameter can you use to stop the DAG run if it is not completed in a given interval of time?

The scheduler, by default, will kick off a DAG Run for any data interval that has not been run since the last data interval (or has been cleared). This concept is called Catchup.


2 Answers

To run any DAGs, you need to make sure two processes are running:

  • airflow webserver
  • airflow scheduler

If you only have airflow webserver running, the UI will show DAGs as running, but if you click on the DAG, none of it's tasks are actually running or scheduled, but rather in a Null state. What this means is that they are waiting to be picked up by airflow scheduler. If airflow scheduler is not running, you'll be stuck in this state forever, as the tasks are never picked up for execution.

Additionally, make sure that the toggle button in the DAGs view is switched to 'ON' for the particular DAG. Otherwise it will not get picked up by the scheduler if you trigger it manually.

like image 162
Ladislav Indra Avatar answered Oct 05 '22 23:10

Ladislav Indra


I too recently started using Airflow and my dags kept endlessly running. Your dag may be set on 'pause' without you realizing it, and thus the scheduler will not schedule new task instances and when you trigger the dag it just looks like it is endlessly running.

There are a few solutions:

1) In the Airflow UI toggle the button left of the dag from 'Off' to 'On'. Off means that the dag is paused, so On will allow the scheduler to pick it up and complete the dag. (this fixed my initial issue)

2) In your airflow.cfg file dags_are_paused_at_creation = True, is the default. So all new dags you create are paused from the start. Change this to False, and future dags you create will be good to go right away (i had to reboot webserver and scheduler for changes to the airflow.cfg to be recognized)

3) use the command line $ airflow unpause [dag_id] documentation: https://airflow.apache.org/cli.html#unpause

like image 35
jbreezybaby Avatar answered Oct 05 '22 23:10

jbreezybaby