Are there substantial performance implications that I should keep in mind when Celery workers are pulling from multiple (or perhaps many) queues? For example, would there be a significant performance penalty if my system were designed so that workers pulled from 10 to 15 queues rather than just 1 or 2? As a follow-up, what if some of those queues are sometimes empty?
celery beats only trigger those 1000 tasks (by the crontab schedule), not run them. If you want to run 1000 tasks in parallel, you should have enough celery workers available to run those tasks.
The Celery worker then has to wait for every task before it starts execution. This demonstrates how Celery made use of Redis to distribute tasks across multiple workers and to manage the task queue.
delay() is the quickest way to send a task message to Celery. This method is a shortcut to the more powerful . apply_async() , which additionally supports execution options for fine-tuning your task message.
It's incredibly lightweight, supports multiple brokers (RabbitMQ, Redis, and Amazon SQS), and also integrates with many web frameworks, e.g. Django, etc. Celery's asynchronous task queue allows the execution of tasks and its concurrency makes it useful in several production systems.
The short answer to your question on queue limits is:
Don't worry having multiple queues will not be worse or better, broker are designed to handle huge numbers of them. Off course in a lot of use cases you don't need so many, except really advanced one. Empty queues don't create any problem, they just take a tiny amount of memory on the broker.
Don't forget also that you have other things like exchanges and bindings, also there you don't have real limits but is better you understand the performance implication of each of them before using it (a TOPIC exchange will use more CPU than a direct one for example)
To give you a more complete answer let's look at the performance topic from a more generic point of view.
When looking at a distributed system based on message passing like Celery there are 2 main topics to analyze from the point of view of performance:
The workers number and concurrency factor.
As you probably already know each celery worker has a concurrency parameter that sets how many tasks can be executed at the same time, this should be set in relation with the server capacity(CPU, RAM, I/O) and off course also based on the type of tasks that the specific consumer will execute (depends on the queue that it will consume).
Off course depending on the total number of tasks you need to execute in a certain time window you will need to decide how many workers/servers you will need to have up and running.
The broker, the Single point of Failure in this architecture style.
The broker, especially RabbitMQ, is designed to manage millions of messages without any problem, however more messages it will need to store more memory will use and more are the messages to route more CPU it will use.
This machine should be well tuned too and if possible be in high availability.
Off course the main thing to avoid is the messages are consumed at a lower rate than they are produced otherwise your queue will keep growing and your RabbitMQ will explode. Here you can find some hints.
There are cases where you may also need to increase the number of tasks executed in a certain time frame but on only in response to peaks of requests. The nice thing about this architecture is that you can monitor the size of the queues and when you understand is growing to fast you could create new machines on the fly with a celery worker already configured and than turn it off when they are not needed. This is a quite cost saving and efficient approach.
One hint, remember to don't store celery tasks results in RabbitMQ.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With