I'm trying to understand how and when tasks are cleaned up in celery. From looking at the task docs I see that:
Old results will be cleaned automatically, based on the CELERY_TASK_RESULT_EXPIRES setting. By default this is set to expire after 1 day: if you have a very busy cluster you should lower this value.
But this quote is from the RabbitMQ Result Backend section and I do not see any similar text in the Database Backend section. So my question is: is there a backend agnostic approach I can take for old task clean-up with celery and if not is there a DB Backend specific approach I should take? Incase it makes any difference I'm using django-celery. Thanks.
backend_cleanup, cleans the DB regardless if CELERY_RESULT_EXPIRES is set to 0 if it has been previously set to another value #6295.
In Celery, a result back end is a place where, when you call a Celery task with a return statement, the task results are stored. Choosing the right results back end can potentially save you hours of pain later.
celery beat is a scheduler; It kicks off tasks at regular intervals, that are then executed by available worker nodes in the cluster. By default the entries are taken from the beat_schedule setting, but custom stores can also be used, like storing the entries in a SQL database.
If you click on the link to the setting doc for CELERY_TASK_RESULT_EXPIRES:
http://docs.celeryproject.org/en/latest/userguide/configuration.html#result-expires
It does say that database supports this, but then you need to run celery beat (there's a default periodic task, called every day, to remove expired results).
The backend docs in the task should probably mention this as well, maybe there should be a dedicated guide for backends too. If you want to lobby for this, then please open up an issue at https://github.com/celery/celery/issues
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With