Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why airflow scheduler keeps running my DAG file?

I followed the tutorial, I created a folder $AIRFLOW_HOME/dags, and put the tutorial DAG python file there. I then start the airflow scheduler. By default it is paused. But if I look at the output of airflow scheduler, I saw lot of runs, trying to create the DAGs. Why it keeps running?

[2018-09-10 15:49:24,123] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:24,125] {jobs.py:1538} INFO -
================================================================================
DAG File Processing Stats

File Path                                                       PID  Runtime    Last Runtime    Last Run
------------------------------------------------------------  -----  ---------  --------------  -------------------
/Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py  29257  0.44s      0.43s           2018-09-10T13:49:22
================================================================================
[2018-09-10 15:49:24,125] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:25,133] {dag_processing.py:582} INFO - Started a process (PID: 29258) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:25,560] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:25,561] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:26,567] {dag_processing.py:582} INFO - Started a process (PID: 29259) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:26,993] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:27,001] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:28,009] {dag_processing.py:582} INFO - Started a process (PID: 29260) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:28,439] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:28,440] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:29,445] {dag_processing.py:582} INFO - Started a process (PID: 29261) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:29,872] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:29,873] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:30,876] {dag_processing.py:582} INFO - Started a process (PID: 29263) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:31,309] {jobs.py:1108} INFO - No tasks to consider for execution.
like image 652
Xiang Zhang Avatar asked Sep 10 '18 13:09

Xiang Zhang


People also ask

How do I stop my DAG from running Airflow?

If you stop a DAG and clear the task from the UI, the running tasks in the executor will not stop. When the task is in a running state, you can click on CLEAR, and it will call job. kill() function on the task. This function will set the task's status to shut_down, which will be shifted to up_for_retry immediately.

How do I know if my Airflow scheduler is running?

CLI Check for Scheduler BaseJob with information about the host and timestamp (heartbeat) at startup, and then updates it regularly. You can use this to check if the scheduler is working correctly. To do this, you can use the airflow jobs checks command. On failure, the command will exit with a non-zero error code.

How can I speed up my Airflow scheduler?

One can take a different approach by increasing the number of threads available on the machine that runs the scheduler process so that the max_threads parameter can be set to a higher value. With a higher value, the Airflow scheduler will be able to more effectively process the increased number of DAGs.


1 Answers

The scheduler will "heartbeat" your dag files based on the contents of your airflow.cfg. The two settings that probably most relevant to this are:

min_file_process_interval: How many seconds to wait between file-parsing loops to prevent the logs from being spammed.

scheduler_heartbeat_sec: The scheduler constantly tries to trigger new tasks (look at the scheduler section in the docs for more information). This defines how often the scheduler should run (in seconds).

Consider changing these if you are only running a few DAGs with tasks that are not run very often.

like image 134
Viraj Parekh Avatar answered Oct 12 '22 16:10

Viraj Parekh