Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Debugging celery WorkerLostError with exitcode zero (Django 1.5.5 + celery 3.1.8 + RabbitMQ 3.1.3 on Heroku)

My platform runs through a lot of tasks (several thousand per day). Some of the longer tasks them keep failing with the following error:

Traceback (most recent call last):
  File "/app/.heroku/python/lib/python2.7/site-packages/billiard/pool.py", line 1167, in mark_as_worker_lost
    human_status(exitcode)),
WorkerLostError: Worker exited prematurely: exitcode 0.

According to Celery's Flower, which doesn't provide anything more than the posted traceback, the task was received ( 2014-12-22 22:46:46.196814 ) four minutes before it was started ( 2014-12-22 22:50:03.469647 ), and failed in just ten seconds (epoch 1419288613.34 or 2014-12-22 22:50:13 ).

This has been a recurring problem on my platform. It happens mostly with tasks which run scrapy 0.24.2 but it may also happen with other tasks.

Other durations of WorkerLostError (also with an exit code of zero) are three minutes, five minutes, or seven minutes.

Any thoughts on what could be causing this? All tasks run perfectly fine locally. Thanks.

like image 908
ChrisR Avatar asked Dec 04 '22 04:12

ChrisR


1 Answers

My recommendation is to check all of the modules you are using and your code for 'raise BaseException'. I ran into the issue with WorkerLostError exitcode 0.

After a lot of debugging and figuring out specifically where tasks were failing, I found that it was when BaseException was raised. Instead of providing the error message, WorkerLostError occurred.

By changing to 'raise Exception', the actual error message was provided when something went wrong inside the task. This might not be the same for your case, but it was what I found when dealing with the same error.

like image 189
tony-ch Avatar answered Feb 15 '23 04:02

tony-ch