Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Celery are there significant performance implications of using many queues

Tags:

python

celery

Are there substantial performance implications that I should keep in mind when Celery workers are pulling from multiple (or perhaps many) queues? For example, would there be a significant performance penalty if my system were designed so that workers pulled from 10 to 15 queues rather than just 1 or 2? As a follow-up, what if some of those queues are sometimes empty?

like image 347
FMc Avatar asked May 10 '16 00:05

FMc


People also ask

How many tasks can Celery handle?

celery beats only trigger those 1000 tasks (by the crontab schedule), not run them. If you want to run 1000 tasks in parallel, you should have enough celery workers available to run those tasks.

Can Celery run multiple workers?

The Celery worker then has to wait for every task before it starts execution. This demonstrates how Celery made use of Redis to distribute tasks across multiple workers and to manage the task queue.

What does delay do in Celery?

delay() is the quickest way to send a task message to Celery. This method is a shortcut to the more powerful . apply_async() , which additionally supports execution options for fine-tuning your task message.

Why you should use Celery with RabbitMQ?

It's incredibly lightweight, supports multiple brokers (RabbitMQ, Redis, and Amazon SQS), and also integrates with many web frameworks, e.g. Django, etc. Celery's asynchronous task queue allows the execution of tasks and its concurrency makes it useful in several production systems.


1 Answers

The short answer to your question on queue limits is:

Don't worry having multiple queues will not be worse or better, broker are designed to handle huge numbers of them. Off course in a lot of use cases you don't need so many, except really advanced one. Empty queues don't create any problem, they just take a tiny amount of memory on the broker.

Don't forget also that you have other things like exchanges and bindings, also there you don't have real limits but is better you understand the performance implication of each of them before using it (a TOPIC exchange will use more CPU than a direct one for example)

To give you a more complete answer let's look at the performance topic from a more generic point of view.

When looking at a distributed system based on message passing like Celery there are 2 main topics to analyze from the point of view of performance:

  1. The workers number and concurrency factor.

    As you probably already know each celery worker has a concurrency parameter that sets how many tasks can be executed at the same time, this should be set in relation with the server capacity(CPU, RAM, I/O) and off course also based on the type of tasks that the specific consumer will execute (depends on the queue that it will consume).

    Off course depending on the total number of tasks you need to execute in a certain time window you will need to decide how many workers/servers you will need to have up and running.

  2. The broker, the Single point of Failure in this architecture style.

    The broker, especially RabbitMQ, is designed to manage millions of messages without any problem, however more messages it will need to store more memory will use and more are the messages to route more CPU it will use.

    This machine should be well tuned too and if possible be in high availability.

    Off course the main thing to avoid is the messages are consumed at a lower rate than they are produced otherwise your queue will keep growing and your RabbitMQ will explode. Here you can find some hints.

There are cases where you may also need to increase the number of tasks executed in a certain time frame but on only in response to peaks of requests. The nice thing about this architecture is that you can monitor the size of the queues and when you understand is growing to fast you could create new machines on the fly with a celery worker already configured and than turn it off when they are not needed. This is a quite cost saving and efficient approach.

One hint, remember to don't store celery tasks results in RabbitMQ.

like image 108
Mauro Rocco Avatar answered Oct 18 '22 01:10

Mauro Rocco