I am running Celery workers on Heroku and one of the tasks hit a time out limit. When I retried it manually everything worked fine, so it was probably a connection issue. I am using RabbitMQ as a broker and Celery is configured to do late acknowledge of the tasks (CELERY_ACKS_LATE=True). I expected the task to be returned to RabbitMQ queue and processed again by another worker, but it didn't happen. Do you I need to configure anything else for a task to return to RabbitMQ queue when worker times out?
Here are logs:
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.4/site-packages/billiard/pool.py", line 639, in on_hard_timeout
raise TimeLimitExceeded(job._timeout)
billiard.exceptions.TimeLimitExceeded: TimeLimitExceeded(60,)
[2015-09-02 06:22:14,504: ERROR/MainProcess] Hard time limit (60s) exceeded for simulator.tasks.run_simulations[4e269d24-87a5-4038-b5b5-bc4252c17cbb]
[2015-09-02 06:22:18,877: INFO/MainProcess] missed heartbeat from celery@420cc07b-f5ba-4226-91c9-84a949974daa
[2015-09-02 06:22:18,922: ERROR/MainProcess] Process 'Worker-1' pid:9 exited with 'signal 9 (SIGKILL)'
Installation & configuration We will install celery using pip. We don't use sudo as we are installing celery to our virtual environment. However, we also need to install rabbitmq on the system as it runs in the background. The -detached option allows us to run rabbitmq-server in the background.
By default, Celery routes all tasks to a single queue and all workers consume this default queue.
This way, you delegate queue creation to Celery. You can use apply_async with any queue and Celery will handle it, provided your task is aware of the queue used by apply_async . If none is provided then the worker will listen only for the default queue.
Looks like you're hitting Celery time limits. http://docs.celeryproject.org/en/latest/userguide/workers.html#time-limits
Celery doesn't implement retry logic for tasks by default because it doesn't know if retries are safe for your tasks. Namely, your task needs to be idempotent for retries to be safe.
Thus any retries due to task failures should be made in the task. See the example here: http://docs.celeryproject.org/en/latest/reference/celery.app.task.html#celery.app.task.Task.retry
There are a few reasons why your task could have timed out, but you'd know best. The task could have timed out because it was taking too long to process data or because it was taking too long to fetch data.
If you believe that the task is failing trying to connect to some service, I suggest decreasing the connection timeout interval and adding retry logic in your task. If your task is taking too long to process data, try splitting out your data in chunks and processing it that way. Celery has nice support for this: http://docs.celeryproject.org/en/latest/userguide/canvas.html#chunks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With