Is there a way for a celery job to be retried if the server where the worker is running dies? I don't just mean the sub-process that execute the job, but the entire server becomes unavailable.
I tried with RabbitMQ and Redis as brokers. In both cases, if a job is currently being processed, it is entirely forgotten. When a worker restarts, it doesn't even try to reprocess the job, and looking at Rabbit or Redis, their queues are empty. The result backend is also empty.
It looks like the worker grabs the message and assume it will put it back if the subprocess fails, but if the worker dies also, it can't put it back.
(yes, I work in an environment where this happens more than once a year, and I don't want to lose tasks)
In theory, set task_acks_late=True
should do the trick. (doc)
With a Redis broker, the task will be redelivered after visibility_timeout
, which defaults to one hour. (doc)
With RabbitMQ, the task is redelivered as soon as Rabbit noticed that the worker died.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With