Celery Task Grouping/Aggregation

Tags:

I'm planning to use Celery to handle sending push notifications and emails triggered by events from my primary server.

These tasks require opening a connection to an external server (GCM, APS, email server, etc). They can be processed one at a time, or handled in bulk with a single connection for much better performance.

Often there will be several instances of these tasks triggered separately in a short period of time. For example, in the space of a minute, there might be several dozen push notifications that need to go out to different users with different messages.

What's the best way of handling this in Celery? It seems like the naïve way is to simply have a different task for each message, but that requires opening a connection for each instance.

I was hoping there would be some sort of task aggregator allowing me to process e.g. 'all outstanding push notification tasks'.

Does such a thing exist? Is there a better way to go about it, for example like appending to an active task group?

Am I missing something?

Robert

301

asked Sep 23 '12 21:09

erydo

2 Answers

I recently discovered and have implemented the celery.contrib.batches module in my project. In my opinion it is a nicer solution than Tommaso's answer, because you don't need an extra layer of storage.

Here is an example straight from the docs:

A click counter that flushes the buffer every 100 messages, or every 10 seconds. Does not do anything with the data, but can easily be modified to store it in a database.

# Flush after 100 messages, or 10 seconds.
@app.task(base=Batches, flush_every=100, flush_interval=10)
def count_click(requests):
    from collections import Counter
    count = Counter(request.kwargs['url'] for request in requests)
    for url, count in count.items():
        print('>>> Clicks: {0} -> {1}'.format(url, count))

Be wary though, it works fine for my usage, but it mentions that is an "Experimental task class" in the documentation. This might deter some from using a feature with such a volatile description :)

answered Oct 12 '22 14:10

gak

An easy way to accomplish this is to write all the actions a task should take on a persistent storage (eg. database) and let a periodic job do the actual process in one batch (with a single connection). Note: make sure you have some locking in place to prevent the queue from being processes twice!

There is a nice example on how to do something similar at kombu level (http://ask.github.com/celery/tutorials/clickcounter.html)

Personally I like the way sentry does something like this to batch increments at db level (sentry.buffers module)

answered Oct 12 '22 16:10

Tommaso Barbugli

Related questions
                            
                                Is there a simple way to use Python libraries from Common Lisp?
                            
                                What does this error mean: invalid ELF header
                            
                                PyObjC on Xcode 4
                            
                                "sorted 1-d iterator" based on "2-d iterator" (Cartesian product of iterators)
                            
                                TeX in matplotlib on Mac OS X and TeX Live
                            
                                How would you create a comma-delimited string from a pyodbc result row?
                            
                                How to retrieve from python dict where key is only partially known?
                            
                                Accessing bitfields while reading/writing binary data structures
                            
                                Default constructor parameters in pyyaml
                            
                                How to iterate over Unicode characters in Python 3?
                            
                                NLTK Chunking and walking the results tree
                            
                                wrapping a numpy array in python
                            
                                Personalizing Online Assignments for a Statistics Class [closed]
                            
                                i th order statistic in Python
                            
                                Creating LaTeX math macros within Sphinx
                            
                                Restricting Python's syntax to execute user code safely. Is this a safe approach?
                            
                                Selecting between shelve and sqlite for really large dictionary (Python)
                            
                                Using `@unittest.skipIf` with older versions of Python
                            
                                Iterate through class members in order of their declaration
                            
                                Flask - 'NoneType' object is not callable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Celery Task Grouping/Aggregation

Tags:

python

asynchronous

aggregation

task

celery

erydo

People also ask

2 Answers

gak

Tommaso Barbugli

Recent Activity

Donate For Us