Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are my Airflow tasks queued but not running?

I am new to airflow and trying to setup airflow to run ETL pipelines. I was able to install

  1. airflow
  2. postgres
  3. celery
  4. rabbitmq

I am able to test run the tutorial dag. When I try to schedule the jobs, scheduler is able to pick it up and queue the jobs which I could see on the UI but tasks are not running. Could somebody help me fix this issue?

Here is my config file:

[core]

airflow_home = /root/airflow

dags_folder = /root/airflow/dags

base_log_folder = /root/airflow/logs

executor = CeleryExecutor

sql_alchemy_conn = postgresql+psycopg2://xxxx.amazonaws.com:5432/airflow

api_client = airflow.api.client.local_client


[webserver]


web_server_host = 0.0.0.0

web_server_port = 8080

web_server_worker_timeout = 120

worker_refresh_batch_size = 1

worker_refresh_interval = 30

[celery]

celery_app_name = airflow.executors.celery_executor

celeryd_concurrency = 16

worker_log_server_port = 8793

broker_url = amqp://rabbit:[email protected]/rabbitmq_vhost

celery_result_backend = db+postgresql+psycopg2://postgres:[email protected]:5432/airflow


flower_host = 0.0.0.0

flower_port = 5555

default_queue = default

DAG: This is the tutorial dag i used

and the start date for my dag is -- 'start_date': datetime(2017, 4, 11),

like image 494
Deepak S Avatar asked Apr 20 '17 15:04

Deepak S


People also ask

What is queue in Airflow?

queue is an attribute of BaseOperator, so any task can be assigned to any queue. The default queue for the environment is defined in the airflow. cfg 's celery -> default_queue . This defines the queue that tasks get assigned to when not specified, as well as which queue Airflow workers listen to when started.

How do I know if the Airflow scheduler is running?

CLI Check for Scheduler BaseJob with information about the host and timestamp (heartbeat) at startup, and then updates it regularly. You can use this to check if the scheduler is working correctly. To do this, you can use the airflow jobs checks command. On failure, the command will exit with a non-zero error code.


1 Answers

have your run all the three components of airflow, namely:

airflow webserver
airflow scheduler
airflow worker

If you only run the previous two, the tasks will be queued, but not executed. airflow worker will provide the workers that actually execute the dags.

Also btw, celery 4.0.2 is not compatible with airflow 1.7 or 1.8 currently. Use celery 3 instead.

like image 50
Xia Wang Avatar answered Oct 13 '22 13:10

Xia Wang