In Celery are there significant performance implications of using many queues

Tags:

Are there substantial performance implications that I should keep in mind when Celery workers are pulling from multiple (or perhaps many) queues? For example, would there be a significant performance penalty if my system were designed so that workers pulled from 10 to 15 queues rather than just 1 or 2? As a follow-up, what if some of those queues are sometimes empty?

347

asked May 10 '16 00:05

FMc

1 Answers

The short answer to your question on queue limits is:

Don't worry having multiple queues will not be worse or better, broker are designed to handle huge numbers of them. Off course in a lot of use cases you don't need so many, except really advanced one. Empty queues don't create any problem, they just take a tiny amount of memory on the broker.

Don't forget also that you have other things like exchanges and bindings, also there you don't have real limits but is better you understand the performance implication of each of them before using it (a TOPIC exchange will use more CPU than a direct one for example)

To give you a more complete answer let's look at the performance topic from a more generic point of view.

When looking at a distributed system based on message passing like Celery there are 2 main topics to analyze from the point of view of performance:

The workers number and concurrency factor.

As you probably already know each celery worker has a concurrency parameter that sets how many tasks can be executed at the same time, this should be set in relation with the server capacity(CPU, RAM, I/O) and off course also based on the type of tasks that the specific consumer will execute (depends on the queue that it will consume).

Off course depending on the total number of tasks you need to execute in a certain time window you will need to decide how many workers/servers you will need to have up and running.
The broker, the Single point of Failure in this architecture style.

The broker, especially RabbitMQ, is designed to manage millions of messages without any problem, however more messages it will need to store more memory will use and more are the messages to route more CPU it will use.

This machine should be well tuned too and if possible be in high availability.

Off course the main thing to avoid is the messages are consumed at a lower rate than they are produced otherwise your queue will keep growing and your RabbitMQ will explode. Here you can find some hints.

There are cases where you may also need to increase the number of tasks executed in a certain time frame but on only in response to peaks of requests. The nice thing about this architecture is that you can monitor the size of the queues and when you understand is growing to fast you could create new machines on the fly with a celery worker already configured and than turn it off when they are not needed. This is a quite cost saving and efficient approach.

One hint, remember to don't store celery tasks results in RabbitMQ.

108

answered Oct 18 '22 01:10

Mauro Rocco

Related questions
                            
                                How do I expose a function in a Python module?
                            
                                Tornado import error: 'no named module singledispatch'
                            
                                Selenium: Quit Python script without closing browser
                            
                                Sending raw IP Traffic with Python: Detect MTU
                            
                                Get displayed string for cell value with openpyxl
                            
                                Continue the script if an element is not found using selenium in Python
                            
                                In sympy plotting, how can I get a plot with a fixed aspect ratio?
                            
                                Catch IntegrityError in Django Admin
                            
                                Python how to turn a result of a method into generator
                            
                                In Tensorflow, can I use tf.gather() for partial connection?
                            
                                encode unicode characters to unicode escape sequences
                            
                                Django AWS Elastic Beanstalk WSGIPath refers to a file that does not exist
                            
                                strace a python function
                            
                                deconstructing and reconstructing python dictionary
                            
                                xlsxwriter: How to insert a new row
                            
                                Tricky filling holes in an image
                            
                                Override Flask-Security's /login endpoint
                            
                                Is there a clean way to write a one-line help per choice for argparse choices?
                            
                                Given a set of points defined in (X, Y, Z) coordinates, interpolate Z-value at arbitrary (X, Y)
                            
                                Why no __getitem__ raises TypeError

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In Celery are there significant performance implications of using many queues

Tags:

python

celery

FMc

People also ask

1 Answers

Mauro Rocco

Recent Activity

Donate For Us