Why won't ZMQ drop messages?

Question

I have an application which fetches messages from a ZeroMQ publisher, using a PUB/SUB setup. The reader is slow sometimes so I set a HWM on both the sender and receiver. I expect that the receiver will fill the buffer and jump to catch up when it recovers from processing slowdowns. But the behavior that I observe is that it never drops! ZeroMQ seems to be ignoring the HWM. Am I doing something wrong?

Here's a minimal example:

publisher.py

import zmq
import time

ctx = zmq.Context()
sock = ctx.socket(zmq.PUB)

sock.setsockopt(zmq.SNDHWM, 1)

sock.bind("tcp://*:5556")

i = 0

while True:
    sock.send(str(i))
    print i
    time.sleep(0.1)
    i += 1

subscriber.py

import zmq
import time

ctx = zmq.Context()
sock = ctx.socket(zmq.SUB)
sock.setsockopt(zmq.SUBSCRIBE, "")
sock.setsockopt(zmq.RCVHWM, 1)
sock.connect("tcp://localhost:5556")

while True:
    print sock.recv()
    time.sleep(0.5)

Jason · Accepted Answer

I believe there are a couple things at play here:

High Water Marks are not exact (see the last paragraph in the linked section) - typically this means the real queue size will be smaller than your listed number, I don't know how this will behave at 1.
Your PUB HWM will never drop messages... due to the way PUB sockets work, it will always immediately processes the message whether there is an available subscriber or not. So unless it actually takes ZMQ .1 seconds to process the message through the queue, your HWM will never come into play on the PUB side.

What should be happening is something like the following (I'm assuming an order of operations that would allow you to actually receive the first published message):

Start up subscriber.py & wait a suitable period to make sure it's completely spun up (basically immediately)
Start up publisher.py
PUB processes and sends the first message, SUB receives and processes the first message
PUB sleeps for .1 seconds and processes & sends the second message
SUB sleeps for .5 seconds, the socket receives the second message but sits in queue until the next call to sock.recv() processes it
PUB sleeps for .1 seconds and processes & sends the third message
SUB is still sleeping for another .3 seconds, so the third message should hit the queue behind the second message, which would make 2 messages in the queue, and the third one should drop due to the HWM

... etc etc etc.

I suggest the following changes to help troubleshoot the issue:

Remove the HWM on your publisher... it does nothing but add a variable we don't need to deal with in your test case, since we never expect it to change anything. If you need it for your production environment, add it back in and test it in a high volume scenario later.
Change the HWM on your subscriber to 50. It'll make the test take longer, but you won't be at the extreme edge case, and since the ZMQ documentation states that the HWM isn't exact, the extreme edge cases could cause unexpected behavior. Mind you, I believe your test (being small numbers) wouldn't do that, but I haven't looked at the code implementing the queues so I can't say with certainty, and it may be possible that your data is small enough that your effective HWM is actually larger.
Change your subscriber sleep time to 3 full seconds... in theory, if your queue holds up to exactly 50 messages, you'll saturate that within two loops (just like you do now), and then you'll have to wait 2.5 minutes to work through those messages to see if you start getting skips, which after the first 50 messages should start jumping large groups of numbers. But I'd wait at least 5-10 minutes. If you find that you start skipping after 100 or 200 messages, then you're being bitten by the smallness of your data.

This of course doesn't address what happens if you still don't skip any messages... If you do that and still experience the same issue, then we may need to dig more into how high water marks actually work, there may be something we're missing.

ZFY · Answer

I met exactly the same problem, and my demo is nearly the same with yours, the subscriber or publisher won't drop any message after either zmq.RCVHWM or zmq.SNDHWM is set to 1.

I walk around after referring to the suicidal snail pattern for slow subscriber detection in Chap.5 of zguide. Hope it helps.

BTW: would you please let me know if you've solved the bug of zmq.HWM ?

Why won't ZMQ drop messages?

Tags:

python

zeromq

pyzmq

Brendan Howell

2 Answers

Jason

ZFY

Recent Activity

Donate For Us

Why won't ZMQ drop messages?

Tags:

python

zeromq

pyzmq

Brendan Howell

2 Answers

Jason

ZFY

Related questions

Recent Activity

Donate For Us