Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use airflow with Celery

Tags:

python

airflow

I'm new to airflow and celery, and I have finished drawing dag by now, but I want to run task in two computers which are in the same subnet, I want to know how to modify the airflow.cfg. Some examples could be better. Thanks to any answers orz.

like image 678
Fewfy Avatar asked Jul 16 '17 13:07

Fewfy


People also ask

How do you run the Airflow on Celery?

To set up the Airflow Celery Executor, first, you need to set up an Airflow Celery backend using the message broker services such as RabbitMQ, Redis, etc. After that, you need to change the airflow. cfg file to point the executor parameters to CeleryExecutor and enter all the required configurations for it.

Which Executor is best for Airflow?

Airflow comes configured with the SequentialExecutor by default, which is a local executor, and the safest option for execution, but we strongly recommend you change this to LocalExecutor for small, single-machine installations, or one of the remote executors for a multi-machine/cloud installation.

How does Celery Executor work?

Celery executor​ To optimize for flexibility and availability, the Celery executor works with a "pool" of independent workers and uses messages to delegate tasks. On Celery, your deployment's scheduler adds a message to the queue and the Celery broker delivers it to a Celery worker (perhaps one of many) to execute.

How do I set up an Airflow cluster?

To set up an airflow cluster, we need to install below components and services: Airflow Webserver: A web interface to query the metadata to monitor and execute DAGs. Airflow Scheduler: It checks the status of the DAG's and tasks in the metadata database, create new ones if necessary, and sends the tasks to the queues.


1 Answers

The Airflow documentation covers this quite nicely:

First, you will need a celery backend. This can be for example Redis or RabbitMQ. Then, the executor parameter in your airflow.cfg should be set to CeleryExecutor.

Then, in the celery section of the airflow.cfg, set the broker_url to point to your celery backend (e.g. redis://your_redis_host:your_redis_port/1). Point celery_result_backend to a sql database (you can use the same as your main airflow db).

Then, on your worker machines simply kick off airflow worker and your jobs should start on the two machines.

like image 112
Matthijs Brouns Avatar answered Sep 20 '22 20:09

Matthijs Brouns