Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow: When to use CeleryExecutor and when to use MesosExecutor

Tags:

I am pretty new to Airflow and trying to understand how should we set it up in our environment(on aws).

I read the Airflow uses Celery with redis broker. How is it different from Mesos? I have not used Celery before but I tried to set up celery-redis on my dev machine and it worked with ease. But adding new components means, add more monitoring.

Since we already use mesos for our cluster management, I am trying to think what am I missing if I dont chose celery and go with MesosExecutor instead?

like image 533
Roger Avatar asked Apr 17 '16 18:04

Roger


People also ask

What is CeleryExecutor in airflow?

CeleryExecutor is one of the ways you can scale out the number of workers. For this to work, you need to setup a Celery backend (RabbitMQ, Redis, …) and change your airflow. cfg to point the executor parameter to CeleryExecutor and provide the related Celery settings.

Why Redis is used in airflow?

This post uses Redis and celery to scale-out airflow. Redis is a simple caching server and scales out quite well. It can be made resilient by deploying it as a cluster. In my previous post, the airflow scale-out was done using celery with rabbitmq as the message broker.

How do you set up Celery executor airflow?

To set up the Airflow Celery Executor, first, you need to set up an Airflow Celery backend using the message broker services such as RabbitMQ, Redis, etc. After that, you need to change the airflow. cfg file to point the executor parameters to CeleryExecutor and enter all the required configurations for it.

What are workers in airflow?

Data Pipelines with Apache Airflow MEAP V05 The Airflow workers - which pick up tasks that are scheduled for execution and execute them. As such, the workers are responsible for actually 'doing the work'.


1 Answers

Using Celery is the more proven/stable approach at the moment.

For us, managing dependencies using containers is more convenient than managing dependencies on the Mesos instances, which is the case if you choose MesosExecutor. As such we are finding Celery more flexible.

We are currently using Celery + RabbitMQ but we will switch to MesosExecutor in the future though, as our codebase stabilises.

like image 144
ImDarrenG Avatar answered Sep 28 '22 02:09

ImDarrenG