I recently integrated celery (django-celery to be more specific) in one of my applications. I have a model in the application as follows.
class UserUploadedFile(models.Model)
original_file = models.FileField(upload_to='/uploads/')
txt = models.FileField(upload_to='/uploads/')
pdf = models.FileField(upload_to='/uploads/')
doc = models.FileField(upload_to='/uploads/')
def convert_to_others(self):
# Code to convert the original file to other formats
Now, once a user uploads a file, i want to convert the original file to txt, pdf and doc formats. calling the convert_to_others
method is a bit of an expensive process so i plan to do it asynchronously using celery. So i wrote a simple celery task as follows.
@celery.task(default_retry_delay=bdev.settings.TASK_RETRY_DELAY)
def convert_ufile(file, request):
"""
This task method would call a UserUploadedFile object's convert_to_others
method to do the file conversions.
The best way to call this task would be doing it asynchronously
using apply_async method.
"""
try:
file.convert_to_others()
except Exception, err:
# If the task fails log the exception and retry in 30 secs
log.LoggingMiddleware.log_exception(request, err)
convert_ufile.retry(exc=err)
return True
and then called the task as follows:
ufile = get_object_or_404(models.UserUploadedFiles, pk=id)
tasks.convert_ufile.apply_async(args=[ufile, request])
Now when the apply_async
method is called it raises the following exception:
PicklingError: Can't pickle <type 'cStringIO.StringO'>: attribute lookup cStringIO.StringO failed
I think this is because celery (by default) uses pickle
library to serialize data, and pickle is not able to serialize the binary file.
Are there any other serializers that can serialize a binary file on its own? If not how can i serialize a binary file using the default pickle
serializer ?
apply_async(args[, kwargs[, …]]) Sends a task message. delay(*args, **kwargs) Shortcut to send a task message, but doesn't support execution options.
It performs dual roles in that it defines both what happens when a task is called (sends a message), and what happens when a worker receives that message. Every task class has a unique name, and this name is referenced in messages so the worker can find the right function to execute.
The Results Backend delay places the task in the queue and returns a promise that can be used to monitor the status and get the result when it's ready. Calling get in that promise will block the execution until the result is available.
You are correct that celery tries to pickle data for which pickling is unsupported. Even if you would find a way to serialize data you want to send to celery task, I wouldn't do this.
It is always a good idea to send as little data as possible to the celery tasks, so in your case I would pass only the id of a UserUploadedFile instance. Having this you can fetch your object by id in celery task and perform convert_to_others() .
Please also note that the object could change its state (or it could even be deleted) before the task is executed. So it is much safer to fetch the object in your celery task instead of sending its full copy.
To sum up, sending only an instance id and refetching it in task gives you a few things:
The only 'drawback' is that you need to perform an extra, inexpensive SELECT query to refetch your data, which in overall looks like a good deal, when compared to above issues, doesn't it?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With