Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between celery get and join

Tags:

python

celery

Is there any difference between:

 r = group(some_task.s(i) for i in range(10)).apply_async()
 result = r.join()

And:

 r = group(some_task.s(i) for i in range(10))()
 result = r.get()

Celery document uses both examples and I do not see any difference.

like image 496
Glueon Avatar asked Oct 18 '15 14:10

Glueon


2 Answers

Short answer

While the get and join methods for a group should return the same results, get implements some caching and will probably be more efficient depending on the backend you're using. Unless you really need to use join for some edge case, you should use get.

Long answer

Here is the source for the get method of celery's ResultSet class which the GroupResult class extends.

def get(self, timeout=None, propagate=True, interval=0.5,
        callback=None, no_ack=True, on_message=None):
    """See :meth:`join`
    This is here for API compatibility with :class:`AsyncResult`,
    in addition it uses :meth:`join_native` if available for the
    current result backend.
    """
    if self._cache is not None:
        return self._cache
    return (self.join_native if self.supports_native_join else self.join)(
        timeout=timeout, propagate=propagate,
        interval=interval, callback=callback, no_ack=no_ack,
        on_message=on_message,
    )

The first thing we see is that the docstring is telling us to look at the join method for documentation. Right off the bat, this is an indication that the methods are very similar.

Looking at the body of the get method, we can see that it first checks for a cached value and returns that if it's set. If no cached value is found, get will call either the join or the join_native method depending on whether the backend supports native joins. If you find the format of that return statement a little confusing, this is essentially the same thing:

if self.supports_native_join:
    return self.join_native(timeout=timeout,
                            propagate=propagate,
                            interval=interval,
                            callback=callback,
                            no_ack=no_ack,
                            on_message=on_message)
else:
    return self.join(timeout=timeout,
                     propagate=propagate,
                     interval=interval,
                     callback=callback,
                     no_ack=no_ack,
                     on_message=on_message)

The docstring for the join method has this to say.

This can be an expensive operation for result store backends that must resort to polling (e.g., database). You should consider using join_native if your backend supports it.

So you should be calling join_native instead of join if your backend supports it. But why bother resorting to conditionally calling one or the other if get wraps up this logic for you? Just use get instead.

like image 58
Sean Avatar answered Sep 20 '22 04:09

Sean


The difference is the difference between groups and chords. The question is if you want the result from all your tasks or if you want a task that does something with the results.

Groups are used to start several tasks and then join the results in the order that they were invoked.

>>> job = group([
...             add.subtask((2, 2)),
...             add.subtask((4, 4)),
...             add.subtask((8, 8)),
...             add.subtask((16, 16)),
...             add.subtask((32, 32)),
... ])
>>> result = job.apply_async()
>>> result.join()
[4, 8, 16, 32, 64]

Chords are when you want a task that executes after all specified tasks are done.

>>> callback = last_task.subtask()
>>> tasks = [task.subtask(...) ... ]
>>> result = chord(tasks)(callback)
>>> result.get()
<output from last_task which have access to the results from the tasks>

You can learn more about these here: http://ask.github.io/celery/userguide/tasksets.html

like image 43
olofom Avatar answered Sep 21 '22 04:09

olofom