I recently started using celery in a new Django project. Settings:
-------------- celery@123 v3.1.7 (Cipater)
---- **** -----
--- * *** * -- Linux-3.8.11-ec2-x86_64-with-debian-squeeze-sid
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: nextlanding_api:0x1c23250
- ** ---------- .> transport: redis://rediscloud@123123
- ** ---------- .> results: djcelery.backends.database:DatabaseBackend
- *** --- * --- .> concurrency: 4 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> celery exchange=celery(direct) key=celery
software -> celery:3.1.7 (Cipater) kombu:3.0.8 py:2.7.4
billiard:3.3.0.13 redis:2.9.0
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.app.AppLoader
settings -> transport:redis results:djcelery.backends.database:DatabaseBackend
I'm investigating an issue where tasks with eta 24+ hours are disappearing (I've ensured the visibility_timeout is > 24 hours). When I warmly shut down the worker, the log statements show several messages being acknowledged. Example:
Restoring 26 unacknowledged message(s).
However, I expected ~50 or so unacknowledged messages to be restored. Looking into my logs a little closer, I see:
[ERROR] celery.worker.job: Task myproj_task[xxx] raised unexpected: WorkerLostError('Worker exited prematurely: signal 15 (SIGTERM).',)
...
WorkerLostError: Worker exited prematurely: signal 15 (SIGTERM).
Restoring 26 unacknowledged message(s).
Process exited with status 0
I've seen others report OOM kills their process. I am on Heroku and see no R14 codes.
One last bit of context, I'm spawning new processes from within my tasks.
My question is: is the WorkerLostError something I should worry about? The status code is 15 (SIGTERM) which seems to be OK. If this error is not normal, could it be a possible cause to losing ETA tasks?
Edit
At first I thought items were disappearing but after putting in some verbose logs, I can see the tasks were issued but never persisted in redis:
myproj_email_task was sent. task_id: b6ce2b97-d5b8-4850-9e43-9185426cd9f6
However, looking over the tasks in redis, the task b6ce2b97-d5b8-4850-9e43-9185426cd9f6
does not exist.
So it would appear the tasks are not disappearing, but either not being sent at all or not being put into the unacked
redis key.
The cause of these WorkerLostErrors is quite likely an incompatibility between the behaviors of Celery and Heroku:
Therefore, all the worker subprocesses get the SIGTERM as well and start immediately terminating, resulting in the WorkerLostErrors.
A workaround has been prepared for the unreleased Celery 4.0: https://github.com/celery/celery/issues/2839
I haven't found a solution for 3.1.19 yet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With