Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I use Celery when task results are large?

What's the best way to handle tasks executed in Celery where the result is large? I'm thinking of things like table dumps and the like, where I might be returning data in the hundreds of megabytes.

I'm thinking that the naive approach of cramming the message into the result database is not going to serve me here, much less if I use AMQP for my result backend. However, I have some of these where latency is an issue; depending on the particular instance of the export, sometimes I have to block until it returns and directly emit the export data from the task client (an HTTP request came in for the export content, it doesn't exist, but must be provided in the response to that request ... no matter how long that takes)

So, what's the best way to write tasks for this?

like image 742
Chris R Avatar asked Nov 22 '10 04:11

Chris R


1 Answers

One option would be to have a static HTTP server running on all of your worker machines. Your task can then dump the large result to a unique file in the static root and return a URL reference to the file. The receiver can then fetch the result at its leisure.

eg. Something vaguely like this:

@task
def dump_db(db):
  # Some code to dump the DB to /srv/http/static/db.sql
  return 'http://%s/%s.sql' % (socket.gethostname(), db)

You would of course need some means of reaping old files, as well as guaranteeing uniqueness, and probably other issues, but you get the general idea.

like image 86
Alec Thomas Avatar answered Nov 21 '22 00:11

Alec Thomas