I wonder why celery chain is so slow comparing to an ad hoc solution.
In the ad hoc solution I forward the task manually, the drawback is I cannot wait for the end of the chain.
In the following code, the canvas solution takes 16 seconds and the ad hoc takes 3 seconds. Wonder if other canvas stuff are also slow comparing to naive solutions.
import sys
from celery import Celery, chain
from celery.task import task
from datetime import datetime
broker = "amqp://admin:[email protected]:5672/tasks"
backend = 'redis://:[email protected]:6379/1'
app = Celery(
"celery-bench",
broker=broker,
backend=backend
)
app.conf.accept_content = ['json']
app.conf.task_serializer = 'json'
app.conf.result_serializer = 'json'
@task(name="result", queue="bench-results")
def result(result):
return result
@task(name="simple-task-auto-chain", queue="bench-tasks")
def simple_task_auto_chain(date, arg):
if arg >= 0:
simple_task_auto_chain.delay(date, arg-1)
return arg
else:
return result.delay(
"AutoChain %s"%(str(datetime.now() - datetime.fromisoformat(date)))
)
@task(name="simple-task", queue="bench-tasks")
def simple_task(args):
date, arg = args
if arg >= 0:
return (date, arg - 1)
else:
return result.s(
"CanvasChain %s"%(str(datetime.now() - datetime.fromisoformat(date)))
).delay()
def bench_auto_chain(n=1000):
now = datetime.now()
simple_task_auto_chain.delay(now, n)
def bench_canvas_chain(n=1000):
now = datetime.now()
chain(
simple_task.s((now, n)),
*[simple_task.s()] * (n + 1),
).delay()
# celery -A benchs-chain worker -l info --concurrency 1 --queues bench-results
# celery -A benchs-chain worker -l info --concurrency 1 --queues bench-tasks
# ./benchs-chain.py auto (~3s)
# ./benchs-chain.py canvas (~16s)
if __name__=='__main__':
if len(sys.argv) > 1:
if 'canvas' in sys.argv:
bench_canvas_chain()
if 'auto' in sys.argv:
bench_auto_chain()
Edit:
I think we got something like this, this is why canvas chain has bad performances.
Yes, you are right. Your method will be faster for this case.
Quote from Celery documentation:
The synchronization step is costly, so you should avoid using chords as much as possible. Still, the chord is a powerful primitive to have in your toolbox as synchronization is a required step for many parallel algorithms.
Chain also has a lot more functionality than auto-chain, like:
As you could see half of the time it takes to create the chain (~18 sec).
Under the hood chain
uses chord
. And they both consume more memory and have many preparation steps to run as you described in question.
When you call the next task, from parent task - you create a single task which doesn't know what will be next, at the end or few steps before. Another thing that for longer tasks you'll not feel that time difference. And finally you loose a lot of information, which probably you don't need in this simple scenario.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With