Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delta encode a C/C++ struct for transmission via sockets

I need to send a C struct over the wire (using UDP sockets, and possibly XDR at some point) at a fairly high update rate, which potentially causes lots of redundant and unnecessary traffic at several khz.

This is because, some of the data in the struct may not have changed at times, so I thought that delta-encoding the current C struct against the previous C struct would seem like a good idea, pretty much like a "diff".

But I am wondering, what's the best approach of doing something like this, ideally in a portable manner that also ensures that data integrity is maintained? Would it be possible to simply XOR the data and proceed like this?

Similarly, it would be important that the approach remains extensible enough, so that new fields can be added to the struct or reordered if necessary (padding), which sounds as if it'd require versioning information, as well.

Any ideas or pointers (are there existing libraries?) would be highly appreciated!

Thanks

EDIT: Thanks to everyone one who provided an answer, the level of detail is really appreciated, I realize that I probably should not have mentioned UDP though, because that is in fact not the main problem, because there is already a corresponding protocol implemented on top of UDP that accounts for the mentioned difficulties, so the question was really meant to be specific to feasible means of delta encoding a struct, and not so much about using UDP in particular as a transport mechanism.

like image 916
guest Avatar asked Oct 24 '09 17:10

guest


3 Answers

UDP does not guarantee that a given packet was actually received, so encoding whatever you transmit as a "difference from last time" is problematic -- you can't know that your counterpart has the same idea as you about what the "last time" was. Essentially you'd have to build some overhead on top of UDP to check what packets have been received (tagging each packet with a unique ID) -- everybody who's tried to go this route will agree that more often than not you find yourself more or less duplicating the TCP streaming infrastructure on top of UDP... only, most likely, not as solid and well-developed (although admittedly sometimes you can take advantage of very special characteristics of your payloads in order to gain some modest advantage over plain good old TCP).

Does your transmission need to be one-way, sender to receiver? If that's the case (i.e., it's not acceptable for the receiver to send acknowledgments or retransmits) then there's really not much you can do along these lines. The one thing that comes to mind: if it's OK for the receiver to be out of sync for a while, then the sender could send two kinds of packets -- one with a complete picture of the current value of the struct, and an identifying unique tag, to be sent at least every (say) 5 minutes (so realistically the receiver may be out of sync for up to 15 minutes if it misses two of these "big packets"); one with just an update (diff) from the last "big packet", including the big packet's identifying unique tag and (e.g.) a run-length-encoded version of the XOR you mention.

Of course once having prepared the run-length-encoded version, the server will compare its size vs the size of the whole struct, and only send the delta-kind of packet if the savings are substantial, otherwise it might as well send the big-packet a bit earlier than needed (gains in reliability). The received will keep track of the last big-packet unique tag it has received and only apply deltas which pertain to it (helps against missing packets and packets delivered out of order, depending how sophisticated you want to make your client).

The need for versioning &c, depending on what exactly you mean (will senders and receivers with different ideas about how the struct's C layout should look need to communicate regularly? how do they handshake about what versions are know to both? etc), will add a whole further universe of complications, but that is really another question, and your core question as summarized in the title is already plenty big enough;-).

If you can afford occasional meta-messages from the receiver back to the sender (acks or requests to resend) then depending on the various numerical parameters in play you may design different strategies. I suspect acks would have to be pretty frequent to do much good, so a request to resend a big-packet (either a specifically identified one or "whatever you have that's freshest") may be the best meta-strategy to cull the options space (which otherwise threatens to explode;-). If so then the sender may be blissfully ignorant of whatever strategy the receiver is using to request big-packet-resends, and you can experiment on the receiver side with various such strategies without needing to redeploy the sender as well.

It's hard to offer much more help without some specifics, i.e., at least ballpark numbers for all the numerical parameters -- packet sizes, frequencies of sending, how long is it tolerable for the sender to be out of sync with the receiver, a bundle of network parameters, etc etc. But I hope this somewhat generic analysis and suggestions still help.

like image 109
Alex Martelli Avatar answered Oct 17 '22 16:10

Alex Martelli


To delta encode:

1) Send "key frames" periodically (e.g. once a second). A key frame is a complete copy (rather than a delta) so that if you lose comms for any reason, you only lose a small amount of data before you can "aquire the signal" again. Use a simple packet header that allows you to detect the start of a packet and know what type of data it contains.

2) Calculate the delta from the previous packet and encode that in a compact form. By examining the type of data you are sending and the way it typically changes, you should be able to devise a pretty compact delta. However, you may need to check the size of the delta - in some cases it may not be an efficient encoding - if it's bigger than a key frame you can just send another key frame instead. You can also decide at this point whether your deltas are lossy or lossless.

3) Add a CRC check to the packet (search for CRC32). This will allow the receiver to verify that the packet has been received intact, allowing them to skip invalid packets.

NOTES:

  • Be careful about doing this over UDP - it gives no guarantee that your packets will arrive in the same order you sent them. Obviously a delta will only work if packets are in order. In this case, you will need to add some form of sequence ID to each packet (first packet is "1", second packet is "2" etc) so that you can detect out-of-order receiving. You may even need to keep a buffer of "n" packets in the receiver so that you can reassemble them in the correct order when you come to decode them (but of course, this could introduce some latency). You will probably also miss some packets over UDP, in which case you'll need to wait until the next keyframe before you'll be able to "re-aquire the signal" - so the key frames must be frequent enough to avoid catastrophic outages in your comms.

  • Consider using compression (e.g. zip etc). You may find a full packet can be built in a zip-friendly manner (e.g. rearrage data to group bytes that are likely to have similar values (especially zeros) together) and then compressed so well that it is smaller than an uncompressed delta, and you won't need to go to all the effort of deltas at all (and you won't have to worry about packet ordering etc).

edit - Always use a version number (or packet type) in your packets so you can add new fields or change the delta encoding in the future! You'll need this for differentiating key/delta frames anyway.

like image 4
Jason Williams Avatar answered Oct 17 '22 16:10

Jason Williams


I'm not convinced that delta encoding values on UDP - which is inherently unreliable and out of order - is going to be particularly easy. Instead, I'd send an ID of the field which has changed and its current value. That also doesn't require anything to change if you want to add extra fields to the data structure you're sending. If you want a standard way of doing this, look at SNMP; that may be something you can drop in, or it may be a bit baggy for you (it qualifies the field names globally and uses ASN.1 - both of which give maximum interoperability, but at the cost of some bytes in the packet).

like image 1
Pete Kirkham Avatar answered Oct 17 '22 18:10

Pete Kirkham