How should I use Celery when task results are large?

Question

What's the best way to handle tasks executed in Celery where the result is large? I'm thinking of things like table dumps and the like, where I might be returning data in the hundreds of megabytes.

I'm thinking that the naive approach of cramming the message into the result database is not going to serve me here, much less if I use AMQP for my result backend. However, I have some of these where latency is an issue; depending on the particular instance of the export, sometimes I have to block until it returns and directly emit the export data from the task client (an HTTP request came in for the export content, it doesn't exist, but must be provided in the response to that request ... no matter how long that takes)

So, what's the best way to write tasks for this?

Alec Thomas · Accepted Answer

One option would be to have a static HTTP server running on all of your worker machines. Your task can then dump the large result to a unique file in the static root and return a URL reference to the file. The receiver can then fetch the result at its leisure.

eg. Something vaguely like this:

@task
def dump_db(db):
  # Some code to dump the DB to /srv/http/static/db.sql
  return 'http://%s/%s.sql' % (socket.gethostname(), db)

You would of course need some means of reaping old files, as well as guaranteeing uniqueness, and probably other issues, but you get the general idea.

How should I use Celery when task results are large?

Tags:

python

architecture

task

celery

task-queue

Chris R

1 Answers

Alec Thomas

Recent Activity

Donate For Us

How should I use Celery when task results are large?

Tags:

python

architecture

task

celery

task-queue

Chris R

1 Answers

Alec Thomas

Related questions

Recent Activity

Donate For Us