Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django-celery and RabbitMQ not executing tasks

We've got a Django 1.3 application with django-celery 2.5.5 that's been running fine in production for month, but all of the sudden one of the celery tasks are failing to execute now.

The RabbitMQ broker and Celery workers are running on a separate machine and celeryconfig.py is configured to use that particular RabbitMQ instance as the backend.

On the application server I tried to manually fire off the celery task via python manage.py shell.

The actual task is called like so:

>>> result = tasks.runCodeGeneration.delay(code_generation, None)
>>> result
<AsyncResult: 853daa7b-8be5-4a25-a1d0-1552b38a0d21>
>>> result.state
'PENDING'

It returns an AsyncResult as expected but its status is forever 'PENDING'.

To see if the RabbitMQ broker received the message I ran the following:

$ rabbitmqctl list_queues name messages messages_ready messages_unacknowledged | grep 853daa
853daa7b8be54a25a1d01552b38a0d21        0       0       0

I'm not sure what this means, RabbitMQ certainly seems to receive some sort of request, otherwise how else could a queue have been created for the task with id: 853daa7b8be54a25a1d01552b38a0d21. It just doesn't seem to hold any messages?

I've tried restarting both Celery and RabbitMQ and the problem persists.

Celery is run like so: $ python /home/[project]/console/manage.py celeryd -B -c2 --loglevel=INFO

Note that the celerybeat/scheduled tasks seem to be running just fine.

[EDIT]: There is no RabbitMQ configuration as it is being inlined by the init.d script: /usr/lib/erlang/erts-5.8.5/bin/beam.smp -W w -K true -A30 -P 1048576 -- -root /usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -noshell -noinput -sname rabbit@hostname -boot /var/lib/rabbitmq/mnesia/rabbit@hostname-plugins-expand/rabbit -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/[email protected]"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/[email protected]"} -os_mon start_cpu_sup true -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@hostname"

[EDIT2]: Here's the celeryconfig we're using for the workers. The same config is used for the producer except of course localhost is changed to the box with RabbitMQ broker on it.

from datetime import timedelta

BROKER_HOST = "localhost"

BROKER_PORT = 5672
BROKER_USER = "console"
BROKER_PASSWORD = "console"

BROKER_VHOST = "console"
BROKER_URL = "amqp://guest:guest@localhost:5672//"

CELERY_RESULT_BACKEND = "amqp"

CELERY_IMPORTS = ("tasks", )

CELERYD_HIJACK_ROOT_LOGGER = True
CELERYD_LOG_FORMAT = "[%(asctime)s: %(levelname)s/%(processName)s/%(name)s] %(message)s"


CELERYBEAT_SCHEDULE = {
    "runs-every-60-seconds": {
        "task": "tasks.runMapReduceQueries",
        "schedule": timedelta(seconds=60),
        "args": ()
    },
}

[EDIT3]: Our infrastructure is set up like number 2 below: Celery worker/broker & django-celery application

like image 798
sw00 Avatar asked Dec 31 '13 12:12

sw00


1 Answers

We solved the problem.

There was a long-running celerybeat task (~430s) that was being scheduled to run every 60 seconds. This hogged all the workers in a queue.

like image 164
sw00 Avatar answered Oct 22 '22 04:10

sw00