I followed the tutorial, I created a folder $AIRFLOW_HOME/dags, and put the tutorial DAG python file there. I then start the airflow scheduler. By default it is paused. But if I look at the output of airflow scheduler, I saw lot of runs, trying to create the DAGs. Why it keeps running?
[2018-09-10 15:49:24,123] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:24,125] {jobs.py:1538} INFO -
================================================================================
DAG File Processing Stats
File Path PID Runtime Last Runtime Last Run
------------------------------------------------------------ ----- --------- -------------- -------------------
/Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py 29257 0.44s 0.43s 2018-09-10T13:49:22
================================================================================
[2018-09-10 15:49:24,125] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:25,133] {dag_processing.py:582} INFO - Started a process (PID: 29258) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:25,560] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:25,561] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:26,567] {dag_processing.py:582} INFO - Started a process (PID: 29259) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:26,993] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:27,001] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:28,009] {dag_processing.py:582} INFO - Started a process (PID: 29260) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:28,439] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:28,440] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:29,445] {dag_processing.py:582} INFO - Started a process (PID: 29261) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:29,872] {jobs.py:1108} INFO - No tasks to consider for execution.
[2018-09-10 15:49:29,873] {dag_processing.py:495} INFO - Processor for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py finished
[2018-09-10 15:49:30,876] {dag_processing.py:582} INFO - Started a process (PID: 29263) to generate tasks for /Users/xiang/Documents/BigData/airflow/dags/my_tutorial_2.py
[2018-09-10 15:49:31,309] {jobs.py:1108} INFO - No tasks to consider for execution.
If you stop a DAG and clear the task from the UI, the running tasks in the executor will not stop. When the task is in a running state, you can click on CLEAR, and it will call job. kill() function on the task. This function will set the task's status to shut_down, which will be shifted to up_for_retry immediately.
CLI Check for Scheduler BaseJob with information about the host and timestamp (heartbeat) at startup, and then updates it regularly. You can use this to check if the scheduler is working correctly. To do this, you can use the airflow jobs checks command. On failure, the command will exit with a non-zero error code.
One can take a different approach by increasing the number of threads available on the machine that runs the scheduler process so that the max_threads parameter can be set to a higher value. With a higher value, the Airflow scheduler will be able to more effectively process the increased number of DAGs.
The scheduler will "heartbeat" your dag files based on the contents of your airflow.cfg
. The two settings that probably most relevant to this are:
min_file_process_interval: How many seconds to wait between file-parsing loops to prevent the logs from being spammed.
scheduler_heartbeat_sec: The scheduler constantly tries to trigger new tasks (look at the scheduler section in the docs for more information). This defines how often the scheduler should run (in seconds).
Consider changing these if you are only running a few DAGs with tasks that are not run very often.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With