Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery + RabbitMQ + "A socket error ocurred"

I'm using Celery within Django with RabbitMQ as the broker on Heroku. My RabbitMQ service is CloudAMQP Tough on Heroku. If relevant, we've been having somewhat frequent memory leaks that I've been trying to plug, but generally service isn't degraded when it happens.

When the site is heavily trafficked (like today), I start getting occasional errors like the following:

Couldn't log in: a socket error occurred

The task is completely thrown out and not registered anywhere. This is obviously a business-critical problem. My celery settings are below:

BROKER_URL = os.getenv('CLOUDAMQP_URL', DEFAULT_AMQP)
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT = ['pickle', 'json']
CELERY_ENABLE_UTC = True
# CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend']
CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True
CELERY_SEND_TASK_ERROR_EMAILS = True
CELERY_RESULT_BACKEND = False
CELERY_IMPORTS = ('business.admin', 'mainsite.views', 'utils.crons', 'mainsite.forms', )
BROKER_POOL_LIMIT = 5

# trying to clean up this memory leak
CELERYD_MAX_TASKS_PER_CHILD = 5
CELERYD_TASK_TIME_LIMIT = 60*60

I'm a bit new to celery so I'm happy to provide as follow-up whatever logs/etc will be helpful, but I'm not even sure what to provide at this point. Is there anything obvious in my settings or environment that seems like it could be causing this problem when heavily trafficked?

like image 287
jdotjdot Avatar asked Oct 28 '14 01:10

jdotjdot


1 Answers

Socket error might be due to RabbitMQ or Heroku being killed by Linux Out-of-Memory Killer. When server runs out of memory due to not utilized memory allocation of some processes, Linux kernel tries to find the cause and kills the associated process. Using too much memory by RabbitMQ might result in being killed. You can find out if Linux OOM killed a particular process using grep -i kill /var/log/messages*

Use following links for more details & learning about Linux OOM configuration:

How to Configure the Linux Out-of-Memory Killer

Do you use supervisord?

Supervisord is a nifty daemon for running and monitoring processes. Using this one you can make sure all long running processes such as RabbitMQ would be always up & running, even if the process is killed.

There are two possible reason for memory leaks:

  1. If settings.DEBUG is True, this could result in memory leaks. Make sure that settings.DEBUG if is set to False in your worker configuration.

  2. You should consume the task results if you plan to keep them. If you do not consume them, you'll be facing memory leak problems. To solve the issue you may change your settings using these lines:

    # Just ignore the results, in case you're not consuming results.
    CELERY_IGNORE_RESULT = True
    CELERY_STORE_ERRORS_EVEN_IF_IGNORED = False
    
like image 157
Saeed Avatar answered Sep 29 '22 08:09

Saeed