The JSON format natively doesn't support binary data. The binary data has to be escaped so that it can be placed into a string element (i.e. zero or more Unicode chars in double quotes using backslash escapes) in JSON.
An obvious method to escape binary data is to use Base64. However, Base64 has a high processing overhead. Also it expands 3 bytes into 4 characters which leads to an increased data size by around 33%.
One use case for this is the v0.8 draft of the CDMI cloud storage API specification. You create data objects via a REST-Webservice using JSON, e.g.
PUT /MyContainer/BinaryObject HTTP/1.1 Host: cloud.example.com Accept: application/vnd.org.snia.cdmi.dataobject+json Content-Type: application/vnd.org.snia.cdmi.dataobject+json X-CDMI-Specification-Version: 1.0 { "mimetype" : "application/octet-stream", "metadata" : [ ], "value" : "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=", }
Are there better ways and standard methods to encode binary data into JSON strings?
The JSON format natively doesn't support binary data. The binary data has to be escaped so that it can be placed into a string element (i.e. zero or more Unicode chars in double quotes using backslash escapes) in JSON. An obvious method to escape binary data is to use Base64.
Also, textual media may reject certain binary values as non-text. Base64 encoding encodes binary data as values that can only be interpreted as text in textual media, and is free of any special characters and/or control characters, so that the data will be preserved across textual media as well.
Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding.
Base64 is a safe encoding for JSON. The downside to using Base64 is that it's quite verbose. It can add significant overhead to your file upload which means longer upload and more bandwidth used. If it's possible I suggest using a binary protocol that supports JSON types.
There are 94 Unicode characters which can be represented as one byte according to the JSON spec (if your JSON is transmitted as UTF-8). With that in mind, I think the best you can do space-wise is base85 which represents four bytes as five characters. However, this is only a 7% improvement over base64, it's more expensive to compute, and implementations are less common than for base64 so it's probably not a win.
You could also simply map every input byte to the corresponding character in U+0000-U+00FF, then do the minimum encoding required by the JSON standard to pass those characters; the advantage here is that the required decoding is nil beyond builtin functions, but the space efficiency is bad -- a 105% expansion (if all input bytes are equally likely) vs. 25% for base85 or 33% for base64.
Final verdict: base64 wins, in my opinion, on the grounds that it's common, easy, and not bad enough to warrant replacement.
See also: Base91 and Base122
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With