Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Celery upgrade (3.1->4.1) - Connection reset by peer

We are working with celery at the last year, with ~15 workers, each one defined with concurrency between 1-4.

Recently we upgraded our celery from v3.1 to v4.1

Now we are having the following errors in each one of the workers logs, any ideas what can cause to such error?

2017-08-21 18:33:19,780 94794  ERROR   Control command error: error(104, 'Connection reset by peer') [file: pidbox.py, line: 46]
Traceback (most recent call last):
  File "/srv/dy/venv/lib/python2.7/site-packages/celery/worker/pidbox.py", line 42, in on_message
    self.node.handle_message(body, message)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 129, in handle_message
    return self.dispatch(**body)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 112, in dispatch
    ticket=ticket)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 135, in reply
    serializer=self.mailbox.serializer)
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/pidbox.py", line 265, in _publish_reply
    **opts
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 181, in publish
    exchange_name, declare,
  File "/srv/dy/venv/lib/python2.7/site-packages/kombu/messaging.py", line 203, in _publish
    mandatory=mandatory, immediate=immediate,
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/channel.py", line 1748, in _basic_publish
    (0, exchange, routing_key, mandatory, immediate), msg
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/abstract_channel.py", line 64, in send_method
    conn.frame_writer(1, self.channel_id, sig, args, content)
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/method_framing.py", line 178, in write_frame
    write(view[:offset])
  File "/srv/dy/venv/lib/python2.7/site-packages/amqp/transport.py", line 272, in write
    self._write(s)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 104] Connection reset by peer

BTW: our tasks in the form:

@app.task(name='EXAMPLE_TASK'],
          bind=True,
          base=ConnectionHolderTask)
def example_task(self, arg1, arg2, **kwargs):
    # task code
like image 211
ItayB Avatar asked Aug 21 '17 18:08

ItayB


1 Answers

We are also having massive issues with celery... I spend 20% of my time just dancing around weird idle-hang/crash issues with our workers sigh

We had a similar case that was caused by a high concurrency combined with a high worker_prefetch_multiplier, as it turns out fetching thousands of tasks is a good way to frack the connection.

If that's not the case: try to disable the broker pool by setting broker_pool_limit to None.

Just some quick ideas that might (hopefully) help :-)

like image 195
ACimander Avatar answered Oct 18 '22 18:10

ACimander