Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ZMQ Pub-Sub Program Failure When Losing Network Connectivity

I have a simple pub-sub setup on a mid-sized network, using ZMQ 2.1. Although some subscribers are using C# bindings, others are using Python bindings, and the issue I'm having is the same for either.

If I pull the network cable from a machine running a subscriber, I get an un-catchable error that immediately terminates that subscriber.

Here's a very simple example of a subscriber in Python (not actual production code, but enough to reproduce the problem):

import zmq

def main(server_address, port):

    context = zmq.Context()
    sub_socket = context.socket(zmq.SUB)
    sub_socket.connect("tcp://" + server_address + ":" + str(port))
    sub_socket.setsockopt(zmq.SUBSCRIBE, "KITH1S2")

    while True:

        msg = sub_socket.recv()      
        print msg  

if __name__ == "__main__": main("company-intranet", 4000)

In C# the program simply terminates silently. In Python I at least get this:

Assertion failed: rc == 0 (....\src\zmq_connector.cpp:48)

This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.

I've tried non-blocking versions, and poller versions, but in either case this instant termination problem persists. Is there something obvious I should be doing but I'm not? (That is, obvious to someone else :) ).

EDIT:

Found the following: https://zeromq.jira.com/browse/LIBZMQ-207

Seems as though it is/was a known issue.

That link further links to Github, where a change log for 2.1.10 has this note:

  • Fixed issue 207, assertion failure in zmq_connecter.cpp:48, when an invalid zmq_connect() string was used, or the hostname could not be resolved. The zmq_connect() call now returns -1 in both those cases.

Although connect() does indeed throw an Invalid Argument exception in Python (not C# apparently?), recv() still fails. If the subscriber machine suddenly loses the network, that subscriber will simply stop functioning.

So - I'm going to try using IP addresses instead of named addresses to see if this will bypass the issue. Not ideal, but better than insta-crash.

like image 573
jomido Avatar asked Nov 30 '11 22:11

jomido


People also ask

Why is my ZMQ client not reconnecting to the host?

If there is any interruption in the docker host's network, the zmq client in the docker container will stop receiving messages and it does not try to reconnect. Network capture shows that there is simply no more packets sent on the TCP connection in either direction. I will try the heartbeat solution mentioned above.

Can ZMQ connect to multiple publishers at the same time?

Just like ZMQ.REQ which can connect to multiple ZMQ.REP, ZMQ.SUB can connect to multiple ZMQ.PUB (publishers). No single publisher overwhelms the subscriber. The messages from both publishers are interleaved.

What is a ZMQ subscriber?

The subscribers usually sets a filter on these topics for topic of their interests. Subscribers are created with ZMQ.SUB socket types. You should notice that a zmq subscriber can connect to many publishers.

Which scenario is more interesting in ZMQ?

It is scenario #1 which is more interesting. Just like ZMQ.REQ which can connect to multiple ZMQ.REP, ZMQ.SUB can connect to multiple ZMQ.PUB (publishers). No single publisher overwhelms the subscriber.


1 Answers

Original question: Is there something obvious I should be doing but I'm not?

No.

The workaround for now is to use IP addressing. This does not cause program failure upon network disconnect for ZMQ 2.1.x.

like image 163
jomido Avatar answered Oct 20 '22 23:10

jomido