Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Guarantee Message delivery with Celery?

I have a python application where I want to start doing more work in the background so that it will scale better as it gets busier. In the past I have used Celery for doing normal background tasks, and this has worked well.

The only difference between this application and the others I have done in the past is that I need to guarantee that these messages are processed, they can't be lost.

For this application I'm not too concerned about speed for my message queue, I need reliability and durability first and formost. To be safe I want to have two queue servers, both in different data centers in case something goes wrong, one a backup of the other.

Looking at Celery it looks like it supports a bunch of different backends, some with more features then the others. The two most popular look like redis and RabbitMQ so I took some time to examine them further.

RabbitMQ: Supports durable queues and clustering, but the problem with the way they have clustering today is that if you lose a node in the cluster, all messages in that node are unavailable until you bring that node back online. It doesn't replicated the messages between the different nodes in the cluster, it just replicates the metadata about the message, and then it goes back to the originating node to get the message, if the node isn't running, you are S.O.L. Not ideal.

The way they recommend to get around this is to setup a second server and replicate the file system using DRBD, and then running something like pacemaker to switch the clients to the backup server when it needs too. This seems pretty complicated, not sure if there is a better way. Anyone know of a better way?

Redis: Supports a read slave and this would allow me to have a backup in case of emergencies but it doesn't support master-master setup, and I'm not sure if it handles active failover between master and slave. It doesn't have the same features as RabbitMQ, but looks much easier to setup and maintain.

Questions:

  1. What is the best way to setup celery so that it will guarantee message processing.

  2. Has anyone done this before? If so, would be mind sharing what you did?

like image 207
Ken Cochrane Avatar asked Jul 05 '11 01:07

Ken Cochrane


People also ask

Why does celery need a message broker?

Message broker such as RabbitMQ provide communication between nodes. Running your Celery clients, workers, and related broker in the cloud gives your team the power to easily manage and scale backend processes, jobs, and basic administrative tasks.

What is celery messaging?

Celery is an open source asynchronous task queue or job queue which is based on distributed message passing. While it supports scheduling, its focus is on operations in real time. Celery. Stable release. 5.2.3 / December 29, 2021.


2 Answers

A lot has changed since the OP! There is now an option for high-availability aka "mirrored" queues. This goes pretty far toward solving the problem you described. See http://www.rabbitmq.com/ha.html.

like image 69
Chris Johnson Avatar answered Sep 20 '22 17:09

Chris Johnson


You might want to check out IronMQ, it covers your requirements (durable, highly available, etc) and is a cloud native solution so zero maintenance. And there's a Celery broker for it: https://github.com/iron-io/iron_celery so you can start using it just by changing your Celery config.

like image 43
Travis Reeder Avatar answered Sep 19 '22 17:09

Travis Reeder