We have a Celery task that requires a Pandas dataframe as an input. The dataframe is first serialized to JSON and then passed as an argument into the task. The dataframes can have around 35 thousand entries, which results in a JSON dictionary occupying about 700kB. We are using Redis as a broker.
Unfortunately the call to delay()
on this task often takes too long (in excess of thirty seconds), and our web requests time out.
Is this the kind of scale that Redis and Celery should be able to handle? I presumed it was well within limits and the problem lies elsewhere, but I can't find any guidance or experience on the internet.
I would suggest to save the json into your database and pass the id to the celery task instead of the whole json.
class TodoTasks(models.Model):
serialized_json = models.TextField()
Moreover, you can keep record of the status of the task with a few fields and even keep error (which I find very usefull for debugging) :
import traceback
from django.db import models
class TodoTasks(models.Model):
class StatusChoices(models.TextChoices):
PENDING = "PENDING", "Awaiting celery to process the task"
SUCCESS = "SUCCESS", "Task done with success"
FAILED = "FAILED", "Task failed to be processed"
serialized_json = models.TextField()
status = models.CharField(
max_length=10, choices=StatusChoices.choices, default=StatusChoices.PENDING
)
created_date = models.DateTimeField(auto_now_add=True)
processed_date = models.DateTimeField(null=True, blank=True)
error = models.TextField(null=True, blank=True)
def handle_exception(self):
self.error = traceback.format_exc()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With