I have an application that periodically needs to send out a snapshot of its current state, which currently would be represented by about 500,000 64 byte messages. I've been having difficulty getting this many messages sent and received quickly and reliably using ZMQ.
I've been using PUB/SUB over tcp to do this currently, but I'm not wedded to either the pattern or the protocol as long as it will get the job done. In my experiments I've focused on playing around with the send and receive high water mark, send and receive buffer settings, and adding some sleeps to the send loop to try to slow it down a bit. With settings that seemed quite generous to me (500K HWM, 10MB buffers) and using only a loopback connection the messages still aren't all being received consistently.
I'm interested in what are appropriate settings for these or other tuning parameters, and more broadly in how to reason about the effect various settings will have.
Some further details that may help provide an appropriate answer:
The distribution is one to many. The expected number of recipients is around 20.
Each message represents a set of information about a different financial instrument, all observed at the same time. In my mind arguments can be made for both combining them into one big message (the set of all messages logically makes up one complete snapshot) and for keeping them separate (clients may potentially be interested only in some instruments, and I think this would help filter them out more easily).
The intended frequency of messages is basically no faster than every 20 milliseconds, and no slower than 5 seconds. Where I actually land will probably be influenced by performance considerations (ie, how fast my server can actually pump the messages out and what kind of data rate would prove overwhelming to clients).
Let's break this down.
First, why the HWM isn't "working":
The HWM is not an exact limit, since internal buffers are filled and emptied by two separate threads, and the count of available space can lag quite a lot when there's a lot of activity. The 0MQ zmq_setsockopt man page says, "0MQ does not guarantee that the socket will accept as many as ZMQ_SNDHWM messages, and the actual limit may be as much as 60-70% lower depending on the flow of messages on the socket."
Second, why are you losing messages:
As you dump 0.5M messages (x 20) into the sockets' buffers, you will randomly hit the HWM and the PUB socket's behaviour then is to drop the messages it can't queue.
Third, how to solve this:
There's zero reason to break the state into separate messages; the only rationale for this would be if the state did not fit into memory, which it does easily. Send as multipart (ZMQ_SNDMORE); this creates a single effective message that takes 1 slot in the outgoing buffer.
Then, remove your 500K HWM limit and revert to the default (1000) which will be more than sufficient.
Fourth, how to get better performance:
Obviously, profile and improve your publisher and subscriber code as possible; these are the usual bottlenecks.
Then, consider some form of compression on the message if it is sparse and you can do that without too much CPU cost. At 20 subscribers you will usually gain more from network overhead than you will lose from CPU cost.
Finally, if you grow to more subscribers and it's a critical system, look at PGM multicast, which will effectively remove the network costs.
After a day of experimenting semi-randomly with various combinations, I've come to the following tentative conclusions:
Adding sleep statements in my send loop to limit the message rate improves reliability with basically any set of options.
Sending the 500,000 messages as frames of a single message instead of 500K individual messages improves reliability.
Using epgm rather than tcp protocol allows higher throughput to be achieved.
With epgm, multicast rate option needs to match the desired message rate achieved by sleep statements.
Increasing the high water mark and buffers help reliability, but you have to increase both settings, and do it on both the client and server. If all are not done in combination it tends not to help. You have to set these quite high to get any kind of reliability running with individual messages (as opposed to frames of a single message). In this case, I didn't get good results until I had the high water marks set to 1,000,000 and the buffers set to 65 MB. (Twice the size of the set of messages I was trying to send.) This was far higher than I instinctively thought to try. That case was pausing 5 seconds between each round of 500K messages. Bringing the interval down to 1 second, I had to push them even higher, to 4 times the size of a single batch of messages.
With epgm, the recovery interval setting does not help much.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With