Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery is rerunning long running completed tasks over and over

I've a python celery-redis queue processing uploads and downloads worth gigs and gigs of data at a time.

Few of the uploads takes upto few hours. However once such a task finishes, I'm witnessing this bizarre celery behaviour that the celery scheduler is rerunning the just concluded task again by sending it again to the worker (I'm running a single worker) And it just happened 2times on the same task!

Can someone help me know why is this happening and how can I prevent it?

The tasks are definitely finishing cleanly with no errors reported just that these are extremely long running tasks.

like image 450
user2252999 Avatar asked Jan 09 '23 07:01

user2252999


1 Answers

I recently ran into this issue, and eventually figured out that tasks were running multiple times because of a combination of task prefetching and tasks exceeded the visibility timeout. Tasks are acknowledged right before they're executed (unless you set ACKS_LATE=True), and by default 4 tasks are prefetched per process. The first task will be acknowledged before execution, but if it takes over an hour to execute then the other prefetched tasks will be delivered to another worker where it will be executed an additional time (or in your case, executed an additional time by the same worker).

You can solve by increasing the visibility timeout to something longer than the longest possible runtime of your tasks:

BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 3600*10}  # 10 hours

You could also set PREFETCH_MULTIPLIER=1 to disable prefetching so that long running tasks don't keep other tasks from being acknowledged.

like image 112
Jason V. Avatar answered Jan 16 '23 20:01

Jason V.