What's the best way to handle tasks executed in Celery where the result is large? I'm thinking of things like table dumps and the like, where I might be returning data in the hundreds of megabytes.
I'm thinking that the naive approach of cramming the message into the result database is not going to serve me here, much less if I use AMQP for my result backend. However, I have some of these where latency is an issue; depending on the particular instance of the export, sometimes I have to block until it returns and directly emit the export data from the task client (an HTTP request came in for the export content, it doesn't exist, but must be provided in the response to that request ... no matter how long that takes)
So, what's the best way to write tasks for this?
One option would be to have a static HTTP server running on all of your worker machines. Your task can then dump the large result to a unique file in the static root and return a URL reference to the file. The receiver can then fetch the result at its leisure.
eg. Something vaguely like this:
@task
def dump_db(db):
# Some code to dump the DB to /srv/http/static/db.sql
return 'http://%s/%s.sql' % (socket.gethostname(), db)
You would of course need some means of reaping old files, as well as guaranteeing uniqueness, and probably other issues, but you get the general idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With