Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spot the Difference, Celery Task Fails Randomly With No Errors

Some of my remote Celery tasks never seem to make it to my broker (RabbitMQ). This appears to happen randomly. There are NO errors in my logs and they never make it to the workers or fail. Flower/Rabbit never reports a task failure.

I used tcpflow -p -c -i eth0 port 5672 to monitor traffic on the API sending the tasks (client).

When the API successfully sends a task outgoing traffic is record as follows:

(sensitive data removed)

192.018.000.002.42738-052.048.150.171.05672: AMQP
052.048.150.171.05672-192.018.000.002.42738:

capabilitiesFpublisher_confirmstexchange_exchange_bindingst
basic.nacktconsumer_cancel_notifytconnection.blockedtconsumer_prioritiestauthentication_failure_closetper_consumer_qostcluster_nameSrabbit@d8b85eb5ab91copyrightS.Copyright (C) 2007-2015 Pivotal Software, Inc.informationS5Licensed under the MPL.  See http://www.rabbitmq.com/platformS
Erlang/OTPproductSRabbitMQversionS3.6.0PLAIN AMQPLAINen_US
192.018.000.002.42738-052.048.150.171.05672:
nproductSpy-amqpproduct_versionS1.4.9capabilitiesF.connection.blockedtconsumer_cancel_notifytAMQPLAIN1LOGINSusernamePASSWORDSxxxxxxen_US
052.048.150.171.05672-192.018.000.002.42738:
<
192.018.000.002.42738-052.048.150.171.05672:

192.018.000.002.42738-052.048.150.171.05672:
(/
052.048.150.171.05672-192.018.000.002.42738:
)
192.018.000.002.42738-052.048.150.171.05672:

052.048.150.171.05672-192.018.000.002.42738:
192.018.000.002.42738-052.048.150.171.05672: $(
estimate_geometrydirect
052.048.150.171.05672-192.018.000.002.42738: (
192.018.000.002.42738-052.048.150.171.05672: 2
estimate_geometry
052.048.150.171.05672-192.018.000.002.42738: 2estimate_geometry
192.018.000.002.42738-052.048.150.171.05672: G2estimate_geometryestimate_geometrytasks.estimate_geometry
052.048.150.171.05672-192.018.000.002.42738: 2
192.018.000.002.42738-052.048.150.171.05672: 1<(estimate_geometrytasks.estimate_geometry
192.018.000.002.42738-052.048.150.171.05672: <application/x-python-serializebinary$021e5308-e6ac-43eb-9a06-8473ba386802$bedeb08f-9614-38b1-9b60-9eded43c3c71
192.018.000.002.42738-052.048.150.171.05672: }q(UexpiresqNUutcqUargsq]qCaUchordqNUcallbacksqNUerrbacksqNUtasksetqNUidq
Utasks.estimate_geometryqUtimelimitqNNUetaqNUkwargsq}qu.
192.018.000.002.42738-052.048.150.171.05672:  (
segment_imagedirect
052.048.150.171.05672-192.018.000.002.42738: (
192.018.000.002.42738-052.048.150.171.05672: 2
segment_image
segment_image71.05672-192.018.000.002.42738: 2
segment_imagetasks.segment_image0.171.05672: ;2
052.048.150.171.05672-192.018.000.002.42738: 2
segment_imagetasks.segment_image0.171.05672: )<(
192.018.000.002.42738-052.048.150.171.05672: <application/x-python-serializebinary$45280975-9611-41e1-bf99-388cdf1b7064$bedeb08f-9614-38b1-9b60-9eded43c3c71
192.018.000.002.42738-052.048.150.171.05672: }q(UexpiresqNUutcqUargsq]qCaUchordqNUcallbacksqNUerrbacksqNUtasksetqNUidq
Utasks.segment_imageqUtimelimitqNNUetaqNUkwargsq}qu.skq

This is a tasks which never makes it to the broker can anyone spot the difference and tell me whats wrong?

192.018.000.002.35908-052.017.119.221.05672: 1<(estimate_geometrytasks.estimate_geometry
192.018.000.002.35908-052.017.119.221.05672: <application/x-python-serializebinary$206c1ae0-43d0-4031-bac6-92d2df92b13c$f4c7420c-b9c2-3525-bd5a-d5955f884f43
192.018.000.002.35908-052.017.119.221.05672: }q(UexpiresqNUutcqUargsq]qCaUchordqNUcallbacksqNUerrbacksqNUtasksetqNUidq
Utasks.estimate_geometryqUtimelimitqNNUetaqNUkwargsq}qu.
segment_imagetasks.segment_image9.221.05672: )<(
192.018.000.002.35908-052.017.119.221.05672: <application/x-python-serializebinary$ce0da18a-6534-42d0-9919-cd2e85c8d5e9$f4c7420c-b9c2-3525-bd5a-d5955f884f43
192.018.000.002.35908-052.017.119.221.05672: }q(UexpiresqNUutcqUargsq]qCaUchordqNUcallbacksqNUerrbacksqNUtasksetqNUidq
Utasks.segment_imageqUtimelimitqNNUetaqNUkwargsq}qu.skq

Additional Info:

celery_app = Celery('tasks')
celery_app.config_from_object('django.conf:settings')
celery_app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

celery_app.send_task('tasks.estimate_geometry', args=[instance.id], kwargs={})

Settings:

import os
from kombu import Exchange, Queue

BROKER_URL = 'amqp://xxxx:[email protected]:5672//'

CELERY_RESULT_BACKEND = "cache"
CELERY_CACHE_BACKEND = 'memcached://xxxxxx:11211'

CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE_TYPE = 'topic'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_QUEUES = (
    Queue('default', Exchange('default'), routing_key='default'),
    Queue('estimate_geometry', Exchange('estimate_geometry'), routing_key='tasks.estimate_geometry'),
    Queue('segment_image', Exchange('segment_image'), routing_key='tasks.segment_image'),
    Queue('geometry_feature', Exchange('geometry_feature'), routing_key='tasks.geometry_feature'),
)
CELERY_ROUTES = {
    'tasks.estimate_geometry': {
        'queue': 'estimate_geometry',
        'routing_key': 'tasks.estimate_geometry',
    },
    'tasks.segment_image': {
        'queue': 'segment_image',
        'routing_key': 'tasks.segment_image',
    },
    'tasks.geometry_feature': {
        'queue': 'geometry_feature',
        'routing_key': 'tasks.geometry_feature',
    },
}


BROKER_HEARTBEAT = 10
like image 256
Glyn Jackson Avatar asked Feb 12 '16 20:02

Glyn Jackson


1 Answers

It looks like you took wrong path in debugging the problem. I see line:

segment_imagetasks.segment_image0.171.05672: )<(" 

in both logs.

If it is print output from your code then it is working, but incorrectly. So the trouble is not in the transport layer.

In first case I can see lines:

192.018.000.002.42738-052.048.150.171.05672:  (
segment_imagedirect
052.048.150.171.05672-192.018.000.002.42738: (
192.018.000.002.42738-052.048.150.171.05672: 2
segment_image
segment_image71.05672-192.018.000.002.42738: 2
segment_imagetasks.segment_image0.171.05672: ;2
052.048.150.171.05672-192.018.000.002.42738: 2

So, sometimes it is working well.

You should run your code(tasks.estimate_geometry) manually with instance ids your have troubles.

If it is server log: may be some c-libraries not compiled on a server and your code works locally but doesn't work on a server. So, check images format.

like image 62
Yevgeniy Shchemelev Avatar answered Sep 20 '22 14:09

Yevgeniy Shchemelev