Celery chain performances

Question

I wonder why celery chain is so slow comparing to an ad hoc solution.

In the ad hoc solution I forward the task manually, the drawback is I cannot wait for the end of the chain.

In the following code, the canvas solution takes 16 seconds and the ad hoc takes 3 seconds. Wonder if other canvas stuff are also slow comparing to naive solutions.

import sys
from celery import Celery, chain
from celery.task import task
from datetime import datetime

broker = "amqp://admin:[email protected]:5672/tasks"
backend = 'redis://:[email protected]:6379/1'

app = Celery(
    "celery-bench",
    broker=broker,
    backend=backend
)

app.conf.accept_content = ['json']
app.conf.task_serializer = 'json'
app.conf.result_serializer = 'json'

@task(name="result", queue="bench-results")
def result(result):
    return result

@task(name="simple-task-auto-chain", queue="bench-tasks")
def simple_task_auto_chain(date, arg):
    if arg >= 0:
        simple_task_auto_chain.delay(date, arg-1)
        return arg
    else:
        return result.delay(
            "AutoChain %s"%(str(datetime.now() - datetime.fromisoformat(date)))
        )

@task(name="simple-task", queue="bench-tasks")
def simple_task(args):
    date, arg = args
    if arg >= 0:
        return (date, arg - 1)
    else:
        return result.s(
            "CanvasChain %s"%(str(datetime.now() - datetime.fromisoformat(date)))
        ).delay()

def bench_auto_chain(n=1000):
    now = datetime.now()
    simple_task_auto_chain.delay(now, n)

def bench_canvas_chain(n=1000):
    now = datetime.now()
    chain(
        simple_task.s((now, n)),
        *[simple_task.s()] * (n + 1),
    ).delay()

# celery -A benchs-chain worker -l info --concurrency 1 --queues bench-results
# celery -A benchs-chain worker -l info --concurrency 1 --queues bench-tasks
# ./benchs-chain.py auto (~3s)
# ./benchs-chain.py canvas (~16s)
if __name__=='__main__':
    if len(sys.argv) > 1:
        if 'canvas' in sys.argv:
            bench_canvas_chain()
        if 'auto' in sys.argv:
            bench_auto_chain()

Edit: I think we got something like this, this is why canvas chain has bad performances. enter image description here

wowkin2 · Accepted Answer

Yes, you are right. Your method will be faster for this case.

Quote from Celery documentation:

The synchronization step is costly, so you should avoid using chords as much as possible. Still, the chord is a powerful primitive to have in your toolbox as synchronization is a required step for many parallel algorithms.

Chain also has a lot more functionality than auto-chain, like:

collecting results of each task
allows even to build a graph of calls
encapsulation of sub-task managing outside the task itself

As you could see half of the time it takes to create the chain (~18 sec).
Under the hood chain uses chord. And they both consume more memory and have many preparation steps to run as you described in question.

When you call the next task, from parent task - you create a single task which doesn't know what will be next, at the end or few steps before. Another thing that for longer tasks you'll not feel that time difference. And finally you loose a lot of information, which probably you don't need in this simple scenario.

Celery chain performances

Tags:

python

celery

celery-canvas

ptitpoulpe

1 Answers

wowkin2

Recent Activity

Donate For Us

Celery chain performances

Tags:

python

celery

celery-canvas

ptitpoulpe

1 Answers

wowkin2

Related questions

Recent Activity

Donate For Us