I just found out about the configuration option <code>CELERYD_PREFETCH_MULTIPLIER</code> (docs). The default is 4, but (I believe) I want the prefetching off or as low as possible. I set it to 1 now, which is close enough to what I'm looking for, but there's still some things I don't understand: <ol> <li>Why is this prefetching a good idea? I don't really see a reason for it, unless there's a lot of latency between the message queue and the workers (in my case, they are currently running on the same host and at worst might eventually run on different hosts in the same data center). The documentation only mentions the disadvantages, but fails to explain what the advantages are.</li> <li>Many people seem to set this to 0, expecting to be able to turn off prefetching that way (a reasonable assumption in my opinion). However, 0 means unlimited prefetching. Why would anyone ever want unlimited prefetching, doesn't that entirely eliminate the concurrency/asynchronicity you introduced a task queue for in the first place?</li> <li>Why can prefetching not be turned off? It might not be a good idea for performance to turn it off in most cases, but is there a technical reason for this not to be possible? Or is it just not implemented?</li> <li>Sometimes, this option is connected to <code>CELERY_ACKS_LATE</code>. For example. Roger Hu writes «[…] often what [users] really want is to have a worker only reserve as many tasks as there are child processes. But this is not possible without enabling late acknowledgements […]» I don't understand how these two options are connected and why one is not possible without the other. Another mention of the connection can be found here. Can someone explain why the two options are connected?</li> </ol>

<ol> <li>Prefetching can improve the performance. Workers don't need to wait for the next message from a broker to process. Communicating with a broker once and processing a lot of messages gives a performance gain. Getting a message from a broker (even from a local one) is expensive compared to the local memory access. Workers are also allowed to acknowledge messages in batches</li> <li>Prefetching set to zero means "no specific limit" rather than unlimited</li> <li>Setting prefetching to 1 is documented to be equivalent to turning it off, but this may not always be the case (see https://stackoverflow.com/a/33357180/71522)</li> <li>Prefetching allows to ack messages in batches. CELERY_ACKS_LATE=True prevents acknowledging messages when they reach to a worker</li> </ol>

Understanding celery task prefetching

Tags:

python

celery

celeryd

I just found out about the configuration option CELERYD_PREFETCH_MULTIPLIER (docs). The default is 4, but (I believe) I want the prefetching off or as low as possible. I set it to 1 now, which is close enough to what I'm looking for, but there's still some things I don't understand:

Why is this prefetching a good idea? I don't really see a reason for it, unless there's a lot of latency between the message queue and the workers (in my case, they are currently running on the same host and at worst might eventually run on different hosts in the same data center). The documentation only mentions the disadvantages, but fails to explain what the advantages are.
Many people seem to set this to 0, expecting to be able to turn off prefetching that way (a reasonable assumption in my opinion). However, 0 means unlimited prefetching. Why would anyone ever want unlimited prefetching, doesn't that entirely eliminate the concurrency/asynchronicity you introduced a task queue for in the first place?
Why can prefetching not be turned off? It might not be a good idea for performance to turn it off in most cases, but is there a technical reason for this not to be possible? Or is it just not implemented?
Sometimes, this option is connected to CELERY_ACKS_LATE. For example. Roger Hu writes «[…] often what [users] really want is to have a worker only reserve as many tasks as there are child processes. But this is not possible without enabling late acknowledgements […]» I don't understand how these two options are connected and why one is not possible without the other. Another mention of the connection can be found here. Can someone explain why the two options are connected?

298

asked Apr 16 '13 14:04

Henrik Heimbuerger

1 Answers

Prefetching can improve the performance. Workers don't need to wait for the next message from a broker to process. Communicating with a broker once and processing a lot of messages gives a performance gain. Getting a message from a broker (even from a local one) is expensive compared to the local memory access. Workers are also allowed to acknowledge messages in batches
Prefetching set to zero means "no specific limit" rather than unlimited
Setting prefetching to 1 is documented to be equivalent to turning it off, but this may not always be the case (see https://stackoverflow.com/a/33357180/71522)
Prefetching allows to ack messages in batches. CELERY_ACKS_LATE=True prevents acknowledging messages when they reach to a worker

143

answered Sep 20 '22 09:09

mher

Related questions
                            
                                Finding multiple occurrences of a string within a string in Python
                            
                                How to troubleshoot an "AttributeError: __exit__" in multiproccesing in Python?
                            
                                Save / load scipy sparse csr_matrix in portable data format
                            
                                Plotting dates on the x-axis with Python's matplotlib
                            
                                How to hide console window in python?
                            
                                How do I undo True = False in python interactive mode? [duplicate]
                            
                                Python/Django: how to assert that unit test result contains a certain string?
                            
                                Does Gunicorn run on Windows
                            
                                Display the time in a different time zone
                            
                                Print timestamp for logging in Python
                            
                                Count frequency of words in a list and sort by frequency
                            
                                Use virtualenv with Python with Visual Studio Code in Ubuntu
                            
                                range over character in python
                            
                                Python: defining my own operators?
                            
                                Convert seconds to hh:mm:ss in Python [duplicate]
                            
                                SSL backend error when using OpenSSL
                            
                                Why does Python start at index -1 (as opposed to 0) when indexing a list from the end? [duplicate]
                            
                                What if I don't close the database connection in Python SQLite
                            
                                unittest Vs pytest
                            
                                equivalent of a python dict in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With