My project at work is using the Jackson JSON serializer to convert a bunch of Java objects into Strings in order to send them to REST services.
Some of these objects contain sensitive data, so I've written custom serializers to serialize these objects to JSON strings, then gzip them, then encrypt them using AES
;
This turns the strings into byte arrays, so I use the Base64 encoder in Apache commons
codec to convert the byte arrays into strings. The custom deserializers behind the REST interfaces reverse this process:
base64 decode -> decrypt -> decompress -> deserialize using default Jackson deserializer.
Base64
encoding increases the size of the output (the gzip step in serialization is meant to help ameliorate this increase), so I checked Google to see if there was a more efficient alternative, which led me to this previous stackoverflow thread that brought up Ascii85 encoding as a more efficient alternative -
Base64
adds 33% to the size of the output, Ascii85
adds 25% to the size of the output.
I found a few Java Ascii85 implementations e.g. Apache pdfbox, but I'm a bit leery to use the encoding - it seems like hardly anybody is using or implementing it, which might just mean that Base64 has more inertia, or which may instead mean that there's some wonky problem with Ascii85.
Does anybody know more on this subject? Are there any problems with Ascii85 that mean that I should use Base64 instead?
Base-122 Encoding A space efficient UTF-8 binary-to-text encoding created as an alternative to base-64 in data URIs. Base-122 is ~14% smaller than equivalent base-64 encoded data.
Hex will take two characters for each byte - Base64 takes 4 characters for every 3 bytes, so it's more efficient than hex. Assuming you're using UTF-8 to encode the XML document, a 100K file will take 200K to encode in hex, or 133K in Base64.
Base 85 uses characters 33 (“!”) through 117 ('u'). ASCII character 32 is a space, so it makes sense you'd want to avoid that one. Since Base85 uses a consecutive range of characters, you can first convert a number to a pure mathematical radix 85 form, then add 33 to each number to find its Base85 character.
The Base64 encoding is used to convert bytes that have binary or text data into ASCII characters. Encoding prevents the data from getting corrupted when it is transferred or processed through a text-only system.
Base64 is way more common. The difference in size really isn't that significant in most cases, and if you add at the HTTP level (which will compress the base64) instead of within your payload, you may well find the difference goes away entirely.
Are there any problems with Ascii85 that mean that I should use Base64 instead?
I would strongly advise using base64 just because it's so much more widespread. It's pretty much the canonical way of representing binary data as text (unless you want to use hex, of course).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With