JSON and Gzip is a simple way to serialize data. These are widely implemented across programming languages. Also this representation is portable across systems (is it?).
My question is whether json+gzip is good enough (less then 2x cost) compared to very efficient binary serialization methods? I'm looking for space and time costs while serializing various kinds of data.
Serialising with json+gzip uses 25% more space than rawbytes+gzip for numbers and objects. For limited precision numbers (4 significant digits) the serialised size is the same. It seems that for small scale applications, using json+gzip is good enough in terms of data size. This is true even when sending an array of records where each record fully spells out the fields and values (the common way of storing data in JavaScript).
Source for the experiment below: https://github.com/csiz/gzip-json-performance
I picked a million floating point (64 bit) numbers. I assume these numbers come from some natural source so I used an exponential distribution to generate them and then round them to 4 significant digits. Because JSON writes down the whole representation I thought storing large numbers might incur a bigger cost (eg. storing 123456.000000, vs 0.123456) so I check both cases. I also check serialising numbers that haven't been rounded.
Size used by compressed json is 9% larger vs compressed binary when serialising small numbers (order of magnitude around 1.0, so only a few digits to write down):
json 3.29mb json/raw 43%
binary 3.03mb binary/raw 40%
json/binary 1.09
Size used by compressed json is 17% smaller vs compressed binary when serialising large numbers (order of magnitude around 1000000, more digits to write down):
json 2.58mb json/raw 34%
binary 3.10mb binary/raw 41%
json/binary 0.83
Size used by compressed json is 22% larger vs compressed binary when serialising full precision doubles:
json 8.90mb json/raw 117%
binary 7.27mb binary/raw 95%
json/binary 1.22
For objects I'm serialising them the usual lazy way in JSON. Each object is stored as a complete record with the field names and values. The "choice" enumeration has it's value fully spelled out.
[
{
"small number": 0.1234,
"large number": 1234000,
"choice": "two"
},
...
]
While for the efficient binary representation I vectorise the objects. I store the number of objects, then a continuous vector of the small numbers, then a continuous vector for the choice enum. In this case I assume the enum values are known and fixed, so I store the index into this enum.
n = 1e6
small number = binary([0.1234, ...])
large number = binary([1234000, ...])
choice = binary([2, ...]) # indexes to the enum ["zero", "one", ..., "four"]
Size used by compressed json is 27% larger vs compressed binary when storing objects:
json 8.36mb json/raw 44%
binary 6.59mb binary/raw 35%
json/binary 1.27
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With