Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django Celery ConnectionError: Too many heartbeats missed

Question

How can I solve the ConnectionError: Too many heartbeats missed from Celery?

Example Error

[2013-02-11 15:15:38,513: ERROR/MainProcess] Error in timer: ConnectionError('Too many heartbeats missed', None, None, None, '')
Traceback (most recent call last):
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/utils/timer2.py", line 97, in apply_entry
    entry()
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/utils/timer2.py", line 51, in __call__
    return self.fun(*self.args, **self.kwargs)
  File "/app/.heroku/python/lib/python2.7/site-packages/celery/utils/timer2.py", line 153, in _reschedules
    return fun(*args, **kwargs)
  File "/app/.heroku/python/lib/python2.7/site-packages/kombu/connection.py", line 265, in heartbeat_check
    return self.transport.heartbeat_check(self.connection, rate=rate)
  File "/app/.heroku/python/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 134, in heartbeat_check
    return connection.heartbeat_tick(rate=rate)
  File "/app/.heroku/python/lib/python2.7/site-packages/amqp/connection.py", line 837, in heartbeat_tick
    raise ConnectionError('Too many heartbeats missed')
ConnectionError: Too many heartbeats missed

App Overview

  • Django app using celery for periodic background tasks
  • Hosted on Heroku
  • Single task scheduled to run every 15 minutes via settings / celerybeat
  • Messaging handled via CloudAMQP add-on
  • Processes run by
    • web: newrelic-admin run-program gunicorn --workers=2 --worker-class=gevent someapp.wsgi:application
    • scheduler: newrelic-admin run-program python manage.py celery worker -B -E --maxtasksperchild=1000 --loglevel=WARNING

Package Versions

Just what I think are relevant:

Django==1.4.3
amqp==1.0.8
billiard==2.7.3.20
celery==3.0.14
gevent==0.13.8
greenlet==0.4.0
kombu==2.5.6
raven==3.1.10

What I've Tried So Far

  • Correlating error with activities (doesn't seem to correlate with user's visiting app, background tasks being called, app idling)
  • Googling / searching SO until my fingers were numb
  • Upgrading packages to latest versions
  • Various levels of logging
  • Exception capturing with Sentry (doesn't appear in sentry)
  • Cannot reproduce error locally under development environment, only in production on Heroku

Possible Relevant Info

  • I'm not sure exactly when this error first appeared (~ one month ago?)
  • May be related in some way to the following changes (don't recall error before this, not 100% sure though)
    • celery==3.0.13 to celery==3.0.14
    • amqplib -> amqp
    • kombu==2.4.8 to kombu==2.5.4
  • Error only appears in logs (doesn't get picked up by New Relic or getsentry.com)
like image 559
Jeff Avatar asked Feb 11 '13 17:02

Jeff


1 Answers

How often does it happen?

It may be that the heartbeat monitoring is not working properly in your case. The heartbeat support was introduced fairly recently, so there may be bugs. I cannot reproduce this here, so I need more data to understand what is going on.

You can disable heartbeats by setting BROKER_HEARTBEAT=0. If this is a bug then the worker should run fine, but it will not be able to quickly detect a broken connection. Being unable to detect connection loss is only a problem in some environments (usually caused by specific router/firewall configurations)

like image 104
asksol Avatar answered Oct 06 '22 13:10

asksol