What would be the best compression algorithm to use to compress packets before sending them over the wire? The packets are encoded using JSON. Would LZW be a good one for this or is there something better?
As text data, JSON data compresses nicely. That's why gzip is our first option to reduce the JSON data size. Moreover, it can be automatically applied in HTTP, the common protocol for sending and receiving JSON.
Lempel–Ziv–Welch (LZW) is a lossless compression algorithm developed in 1984. It is used in the GIF format, introduced in 1987. DEFLATE, a lossless compression algorithm specified in 1996, is used in the Portable Network Graphics (PNG) format.
The fastest algorithm, lz4, results in lower compression ratios; xz, which has the highest compression ratio, suffers from a slow compression speed. However, Zstandard, at the default setting, shows substantial improvements in both compression speed and decompression speed, while compressing at the same ratio as zlib.
I think two questions will affect your answer:
1) How well can you predict the composition of the data without knowing what will happen on any particular run of the program? For instance, if your packets look like this:
{
"vector": {
"latitude": 16,
"longitude": 18,
"altitude": 20
},
"vector": {
"latitude": -8,
"longitude": 13,
"altitude": -5
},
[... et cetera ...]
}
-- then you would probably get your best compression by creating a hard-coded dictionary of the text strings that keep showing up in your data and replace each occurrence of one of the text strings with the appropriate dictionary index. (Actually, if your data was this regular, you'd probably want to send just the values over the wire and simply write a function into the client to construct a JSON object from the values if a JSON object is needed.)
If you cannot predict which headers will be used, you may need to use LZW, or LZ77, or another method which looks at the data which has already gone through to find the data it can express in an especially compact form. However...
2) Do the packets need to be compressed separately from each other? If so then LZW is definitely not the method you want; it will not have time to build its dictionary up to a size that will give substantial compression results by the end of a single packet. The only chance of getting really substantial compression in this scenario, IMHO, is to use a hard-coded dictionary.
(Addendum to all of the above: as Michael Kohne points out, sending JSON means you're probably sending all text, which means that you're underusing bandwidth that has the capability of sending a much wider range of characters than you're using. However, the problem of how to pack characters that fall into the range 0-127 into containers that hold values 0-255 is fairly simple and I think can be left as "an exercise for the reader", as they say.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With