First of all please don't consider this question as a duplicate of this question
I have a setup an environment which uses celery
and redis
as broker
and result_backend
. My question is how can I make sure that when the celery workers crash, all the scheduled tasks are re-tried, when the celery worker is back up.
I have seen advice on using CELERY_ACKS_LATE = True
, so that the broker will re-drive the tasks until it get an ACK, but in my case its not working. Whenever I schedule a task its immediately goes to the worker which persists it until the scheduled time of execution. Let me give some example:
I am scheduling a task like this: res=test_task.apply_async(countdown=600)
, but immediately in celery worker logs i can see something like : Got task from broker: test_task[a137c44e-b08e-4569-8677-f84070873fc0] eta:[2013-01-...]
. Now when I kill the celery worker, these scheduled tasks are lost. My settings:
BROKER_URL = "redis://localhost:6379/0"
CELERY_ALWAYS_EAGER = False
CELERY_RESULT_BACKEND = "redis://localhost:6379/0"
CELERY_ACKS_LATE = True
Celery is an open-source, simple, flexible, distributed system to process vast amounts of messages while providing operations with the tools required to maintain such a system. It's a task queue with a focus on real-time processing, while also supporting task scheduling.
Once you integrate Celery into your app, you can send time-intensive tasks to Celery's task queue. That way, your web app can continue to respond quickly to users while Celery completes expensive operations asynchronously in the background.
Celery will stop retrying after 7 failed attempts and raise an exception.
Return as a list of task IDs. The task result backend to use. Collect results as they return. Iterator, like get() will wait for the task to complete, but will also follow AsyncResult and ResultSet returned by the task, yielding (result, value) tuples for each result in the tree.
Apparently this is how celery behaves. When worker is abruptly killed (but dispatching process isn't), the message will be considered as 'failed' even though you have acks_late=True
Motivation (to my understanding) is that if consumer was killed by OS due to out-of-mem, there is no point in redelivering the same task.
You may see the exact issue here: https://github.com/celery/celery/issues/1628
I actually disagree with this behaviour. IMO it would make more sense not to acknowledge.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With