RabbitMQ closes connection when processing long running tasks and timeout settings produce errors

Tags:

I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.

From researching I understand that either a heartbeat or an increased connection timeout can be used to solve this. Both these solutions raise errors when attempting them. In reading answers to similar posts I've also learned that many changes have been implemented to RabbitMQ since the answers were posted (e.g. the default heartbeat timeout has changed to 60 from 580 prior to RabbitMQ 3.5.5).

When specifying a heartbeat and blocked connection timeout:

credentials = pika.PlainCredentials('user', 'password')
parameters = pika.ConnectionParameters('XXX.XXX.XXX.XXX', port, '/', credentials, blocked_connection_timeout=2000)
connection = pika.BlockingConnection(parameters)

channel = connection.channel()

The following error is displayed:

TypeError: __init__() got an unexpected keyword argument 'blocked_connection_timeout'

When specifying heartbeat_interval=1000 in the connection parameters a similar error is shown: TypeError: __init__() got an unexpected keyword argument 'heartbeat_interval'

And similarly for socket_timeout = 1000 the following error is displayed: TypeError: __init__() got an unexpected keyword argument 'socket_timeout'

I am running RabbitMQ 3.6.1, pika 0.10.0 and python 2.7 on Ubuntu 14.04.

Why are the above approaches producing errors?
Can a heartbeat approach be used where there is a long running continuous task? For example can heartbeats be used when performing large database joins which take 30+ mins? I am in favour of the heartbeat approach as many times it is difficult to judge how long a task such as database join will take.

I've read through answers to similar questions

Update: running code from the pika documentation produces the same error.

709

asked Mar 21 '16 04:03

Greg

1 Answers

I've run into the same problem with my systems, that you are seeing, with dropped connection during very long tasks.

It's possible the heartbeat might help keep your connection alive, if your network setup is such that idle TCP/IP connections are forcefully dropped. If that's not the case, though, changing the heartbeat won't help.

Changing the connection timeout won't help at all. This setting is only used when initially creating the connection.

I am using a RabbitMQ producer to send long running tasks (30 mins+) to a consumer. The problem is that the consumer is still working on a task when the connection to the server is closed and the unacknowledged task is requeued.

there are two reasons for this, both of which you have run into already:

Connections drop randomly, even under the best of circumstances
Re-starting a process because of a re-queued message can cause problems

Having deployed RabbitMQ code with tasks that range from less than a second, out to several hours in time, I found that acknowledging the message immediately and updating the system with status messages works best for very long tasks, like this.

You will need to have a system of record (probably with a database) that keeps track of the status of a given job.

When the consumer picks up a message and starts the process, it should acknowledge the message right away and send a "started" status message to the system of record.

As the process completes, send another message to say it's done.

This won't solve the dropped connection problem, but nothing will 100% solve that anyways. Instead, it will prevent the message re-queueing problem from happening when a connection is dropped.

This solution does introduce another problem, though: when the long running process crashes, how do you resume the work?

The basic answer is to use the system of record (your database) status for the job to tell you that you need to pick up that work again. When the app starts, check the database to see if there is work that is unfinished. If there is, resume or restart that work in whatever manner is appropriate.

answered Sep 19 '22 08:09

Derick Bailey

Related questions
                            
                                lxml (or lxml.html): print tree structure
                            
                                Sublime Text 2 - running selected python code in the interpreter
                            
                                Can anyone tell whats wrong with my relationships?
                            
                                "SELECT ... WHERE ... IN" with unknown number of parameters
                            
                                What is a wrapper_descriptor, and why is Foo.__init__() one in this case?
                            
                                Make sphinx's autodoc show default values in parameters' description
                            
                                How do I access an Oracle db without installing Oracle's client and cx_Oracle?
                            
                                Using ipdb with emacs' gud without explicit breakpoints in code
                            
                                Adding Naming Convention to Existing Database
                            
                                Evaluating pandas series values with logical expressions and if-statements
                            
                                Appending Pandas dataframe to sqlite table by primary key
                            
                                unable to use FeatureUnion in scikit-learn due to different dimensions
                            
                                How to capture the network traffic using python [closed]
                            
                                sqlalchemy - elegant way to deal with several optional filters?
                            
                                Annotate with latest related object in Django
                            
                                Detect simple geometric shapes using PILLOW(PIL)
                            
                                Send email task with correct context
                            
                                Converting pandas.DataFrame to bytes
                            
                                asyncio queue consumer coroutine
                            
                                python dictionary datetime as key, keyError

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

RabbitMQ closes connection when processing long running tasks and timeout settings produce errors

Tags:

python

rabbitmq

amqp

pika

python-pika

Greg

People also ask

1 Answers

Derick Bailey

Recent Activity

Donate For Us