Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why won't ZMQ drop messages?

I have an application which fetches messages from a ZeroMQ publisher, using a PUB/SUB setup. The reader is slow sometimes so I set a HWM on both the sender and receiver. I expect that the receiver will fill the buffer and jump to catch up when it recovers from processing slowdowns. But the behavior that I observe is that it never drops! ZeroMQ seems to be ignoring the HWM. Am I doing something wrong?

Here's a minimal example:

publisher.py

import zmq
import time

ctx = zmq.Context()
sock = ctx.socket(zmq.PUB)

sock.setsockopt(zmq.SNDHWM, 1)

sock.bind("tcp://*:5556")

i = 0

while True:
    sock.send(str(i))
    print i
    time.sleep(0.1)
    i += 1

subscriber.py

import zmq
import time

ctx = zmq.Context()
sock = ctx.socket(zmq.SUB)
sock.setsockopt(zmq.SUBSCRIBE, "")
sock.setsockopt(zmq.RCVHWM, 1)
sock.connect("tcp://localhost:5556")

while True:
    print sock.recv()
    time.sleep(0.5)
like image 281
Brendan Howell Avatar asked Nov 11 '22 07:11

Brendan Howell


2 Answers

I believe there are a couple things at play here:

  1. High Water Marks are not exact (see the last paragraph in the linked section) - typically this means the real queue size will be smaller than your listed number, I don't know how this will behave at 1.
  2. Your PUB HWM will never drop messages... due to the way PUB sockets work, it will always immediately processes the message whether there is an available subscriber or not. So unless it actually takes ZMQ .1 seconds to process the message through the queue, your HWM will never come into play on the PUB side.

What should be happening is something like the following (I'm assuming an order of operations that would allow you to actually receive the first published message):

  1. Start up subscriber.py & wait a suitable period to make sure it's completely spun up (basically immediately)
  2. Start up publisher.py
  3. PUB processes and sends the first message, SUB receives and processes the first message
  4. PUB sleeps for .1 seconds and processes & sends the second message
  5. SUB sleeps for .5 seconds, the socket receives the second message but sits in queue until the next call to sock.recv() processes it
  6. PUB sleeps for .1 seconds and processes & sends the third message
  7. SUB is still sleeping for another .3 seconds, so the third message should hit the queue behind the second message, which would make 2 messages in the queue, and the third one should drop due to the HWM

... etc etc etc.

I suggest the following changes to help troubleshoot the issue:

  1. Remove the HWM on your publisher... it does nothing but add a variable we don't need to deal with in your test case, since we never expect it to change anything. If you need it for your production environment, add it back in and test it in a high volume scenario later.
  2. Change the HWM on your subscriber to 50. It'll make the test take longer, but you won't be at the extreme edge case, and since the ZMQ documentation states that the HWM isn't exact, the extreme edge cases could cause unexpected behavior. Mind you, I believe your test (being small numbers) wouldn't do that, but I haven't looked at the code implementing the queues so I can't say with certainty, and it may be possible that your data is small enough that your effective HWM is actually larger.
  3. Change your subscriber sleep time to 3 full seconds... in theory, if your queue holds up to exactly 50 messages, you'll saturate that within two loops (just like you do now), and then you'll have to wait 2.5 minutes to work through those messages to see if you start getting skips, which after the first 50 messages should start jumping large groups of numbers. But I'd wait at least 5-10 minutes. If you find that you start skipping after 100 or 200 messages, then you're being bitten by the smallness of your data.

This of course doesn't address what happens if you still don't skip any messages... If you do that and still experience the same issue, then we may need to dig more into how high water marks actually work, there may be something we're missing.

like image 162
Jason Avatar answered Nov 15 '22 08:11

Jason


I met exactly the same problem, and my demo is nearly the same with yours, the subscriber or publisher won't drop any message after either zmq.RCVHWM or zmq.SNDHWM is set to 1.

I walk around after referring to the suicidal snail pattern for slow subscriber detection in Chap.5 of zguide. Hope it helps.

BTW: would you please let me know if you've solved the bug of zmq.HWM ?

like image 36
ZFY Avatar answered Nov 15 '22 06:11

ZFY