Is there any difference between:
r = group(some_task.s(i) for i in range(10)).apply_async()
result = r.join()
And:
r = group(some_task.s(i) for i in range(10))()
result = r.get()
Celery document uses both examples and I do not see any difference.
Short answer
While the get
and join
methods for a group
should return the same results, get
implements some caching and will probably be more efficient depending on the backend you're using. Unless you really need to use join
for some edge case, you should use get
.
Long answer
Here is the source for the get
method of celery's ResultSet
class which the GroupResult
class extends.
def get(self, timeout=None, propagate=True, interval=0.5,
callback=None, no_ack=True, on_message=None):
"""See :meth:`join`
This is here for API compatibility with :class:`AsyncResult`,
in addition it uses :meth:`join_native` if available for the
current result backend.
"""
if self._cache is not None:
return self._cache
return (self.join_native if self.supports_native_join else self.join)(
timeout=timeout, propagate=propagate,
interval=interval, callback=callback, no_ack=no_ack,
on_message=on_message,
)
The first thing we see is that the docstring is telling us to look at the join
method for documentation. Right off the bat, this is an indication that the methods are very similar.
Looking at the body of the get
method, we can see that it first checks for a cached value and returns that if it's set. If no cached value is found, get
will call either the join
or the join_native
method depending on whether the backend supports native joins. If you find the format of that return
statement a little confusing, this is essentially the same thing:
if self.supports_native_join:
return self.join_native(timeout=timeout,
propagate=propagate,
interval=interval,
callback=callback,
no_ack=no_ack,
on_message=on_message)
else:
return self.join(timeout=timeout,
propagate=propagate,
interval=interval,
callback=callback,
no_ack=no_ack,
on_message=on_message)
The docstring for the join
method has this to say.
This can be an expensive operation for result store backends that must resort to polling (e.g., database). You should consider using
join_native
if your backend supports it.
So you should be calling join_native
instead of join
if your backend supports it. But why bother resorting to conditionally calling one or the other if get
wraps up this logic for you? Just use get
instead.
The difference is the difference between groups and chords. The question is if you want the result from all your tasks or if you want a task that does something with the results.
Groups are used to start several tasks and then join the results in the order that they were invoked.
>>> job = group([
... add.subtask((2, 2)),
... add.subtask((4, 4)),
... add.subtask((8, 8)),
... add.subtask((16, 16)),
... add.subtask((32, 32)),
... ])
>>> result = job.apply_async()
>>> result.join()
[4, 8, 16, 32, 64]
Chords are when you want a task that executes after all specified tasks are done.
>>> callback = last_task.subtask()
>>> tasks = [task.subtask(...) ... ]
>>> result = chord(tasks)(callback)
>>> result.get()
<output from last_task which have access to the results from the tasks>
You can learn more about these here: http://ask.github.io/celery/userguide/tasksets.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With