Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WorkerLostError('Worker exited prematurely: signal 15 (SIGTERM).',)

I recently started using celery in a new Django project. Settings:

 -------------- celery@123 v3.1.7 (Cipater) 
---- **** -----  
--- * ***  * -- Linux-3.8.11-ec2-x86_64-with-debian-squeeze-sid 
-- * - **** ---  
- ** ---------- [config] 
- ** ---------- .> app:         nextlanding_api:0x1c23250 
- ** ---------- .> transport:   redis://rediscloud@123123 
- ** ---------- .> results:     djcelery.backends.database:DatabaseBackend 
- *** --- * --- .> concurrency: 4 (prefork) 
-- ******* ----  
--- ***** ----- [queues] 
 -------------- .> celery           exchange=celery(direct) key=celery 

software -> celery:3.1.7 (Cipater) kombu:3.0.8 py:2.7.4
            billiard:3.3.0.13 redis:2.9.0
platform -> system:Linux arch:64bit, ELF imp:CPython
loader   -> celery.loaders.app.AppLoader
settings -> transport:redis results:djcelery.backends.database:DatabaseBackend 

I'm investigating an issue where tasks with eta 24+ hours are disappearing (I've ensured the visibility_timeout is > 24 hours). When I warmly shut down the worker, the log statements show several messages being acknowledged. Example: Restoring 26 unacknowledged message(s).

However, I expected ~50 or so unacknowledged messages to be restored. Looking into my logs a little closer, I see:

[ERROR] celery.worker.job: Task myproj_task[xxx] raised unexpected: WorkerLostError('Worker exited prematurely: signal 15 (SIGTERM).',)
...
WorkerLostError: Worker exited prematurely: signal 15 (SIGTERM). 
Restoring 26 unacknowledged message(s). 
Process exited with status 0 

I've seen others report OOM kills their process. I am on Heroku and see no R14 codes.

One last bit of context, I'm spawning new processes from within my tasks.

My question is: is the WorkerLostError something I should worry about? The status code is 15 (SIGTERM) which seems to be OK. If this error is not normal, could it be a possible cause to losing ETA tasks?

Edit

At first I thought items were disappearing but after putting in some verbose logs, I can see the tasks were issued but never persisted in redis:

myproj_email_task was sent. task_id: b6ce2b97-d5b8-4850-9e43-9185426cd9f6

However, looking over the tasks in redis, the task b6ce2b97-d5b8-4850-9e43-9185426cd9f6 does not exist.

So it would appear the tasks are not disappearing, but either not being sent at all or not being put into the unacked redis key.

like image 333
Scott Coates Avatar asked Jan 15 '14 00:01

Scott Coates


1 Answers

The cause of these WorkerLostErrors is quite likely an incompatibility between the behaviors of Celery and Heroku:

  • A Celery worker expects a SIGTERM on the parent worker process, in which case it lets its subprocesses finish their current tasks.
  • When doing a 'warm shutdown' of a dyno, Heroku sends a SIGTERM to all processes in the dyno.

Therefore, all the worker subprocesses get the SIGTERM as well and start immediately terminating, resulting in the WorkerLostErrors.

A workaround has been prepared for the unreleased Celery 4.0: https://github.com/celery/celery/issues/2839

I haven't found a solution for 3.1.19 yet.

like image 178
Henrik Heimbuerger Avatar answered Nov 01 '22 15:11

Henrik Heimbuerger