Celery connection drop with AWS ELB and RabbitMQ

Tags:

In our environment, we use RabbitMQ and Celery on AWS to run tasks in parallel over many nodes.

Recently we turned RabbitMQ into a cluster of 3 nodes, configured a ha policy and added an AWS elastic load balancer (ELB) for port 5672 to all 3 nodes. Our Celery workers and client code all use the ELB DNS as the broker URL.

We have noticed since that change, that waiting for async tasks to finish will throw an exception IOError: Socket closed.

The ELB will shutdown all idle connections after 60 seconds. We have tasks that take few hours to complete.

Setting BROKER_HEARTBEAT to a value lower than 60 solved connection drops on the workers end. But we can't seem to find any setting that will keep the client connection alive.

Is this the correct approach to wait for long running tasks with Celery?

One workaround we haven't tested yet, is to recall the AsyncResult.wait() method until it ends successfully. So for example:

async_result = task.delay(params)

while True:
    try:
        async_result.wait()
        break
    except IOError:
        pass

We use:

RabbitMQ 3.6.5
Celery 3.1.20
Celery backend is pyamqp
Celery results backend is rpc

837

asked Dec 08 '16 13:12

Dov

1 Answers

I believe what you need to do is extend the timeout on the AWS ELB. What's happening is the connection is being closed before the task is complete. You can accomplish this by issuing the following command

elb-modify-lb-attributes myTestELB --connection-settings "idletimeout=3600" --headers

This would give you an hour to complete the task. See https://aws.amazon.com/blogs/aws/elb-idle-timeout-control/ for more info on this.

If an hour is not enough then you're going to have to disable connection pooling. Add these two settings to your celery config

BROKER_POOL_LIMIT = None
BROKER_TRANSPORT_OPTIONS = {'confirm_publish': True}

Second will have a performance hit since it adds some overhead. Since you have long running tasks this may not be an issue. The second setting may not be necessary but I would recommend it given that you're behind a load balancer. This setting will make sure messages are received and not lost in the process.

Another option is breaking your long task into smaller tasks too! This may mean more code but it may be worth it in the long run.

121

answered Sep 30 '22 12:09

Alex Luis Arias

Related questions
                            
                                Executable made with pyInstaller/UPX experiences QtCore4.dll error
                            
                                How to denote return type tuple in Google-style Pydoc for Pycharm?
                            
                                Xgboost: what is the difference among bst.best_score, bst.best_iteration and bst.best_ntree_limit?
                            
                                How to return selenium browser (or how to import a def that return selenium browser)
                            
                                How can I speed up this Keras Attention computation?
                            
                                Why does TensorFlow always use GPU 0?
                            
                                Is double-checked locking thread-safe in Python?
                            
                                what does pip install actually do?
                            
                                Is there a python linter that checks types according to type hints?
                            
                                ast.literal_eval() support for set literals in Python 2.7?
                            
                                Efficient structure for element wise access to very large sparse matrix (Python/Cython)
                            
                                Javascript array with default values (equivalent of Python's defaultdict)? [duplicate]
                            
                                Gtk3 replace child widget with another widget
                            
                                Why is `pandas.read_csv` not the reciprocal of `pandas.DataFrame.to_csv`?
                            
                                How to get R-squared for robust regression (RLM) in Statsmodels?
                            
                                Plotting at full resolution with matplotlib.pyplot, imshow() and savefig()?
                            
                                Interchange location of y and z axis in 3D matplotlib plot
                            
                                Changing subclassed `ndarray` view in-place
                            
                                Sample code for listing a FixedPriceItem with ebay
                            
                                Make Pylint care about blank lines

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Celery connection drop with AWS ELB and RabbitMQ

Tags:

python

amazon-web-services

rabbitmq

celery

Dov

People also ask

1 Answers

Alex Luis Arias

Recent Activity

Donate For Us