Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What text encoding scheme do you use when you have binary data that you need to send over an ascii channel?

If you have binary data that you need to encode, what encoding scheme do you use?

I know about:

  • Hex encoding. Very simple, but quite verbose, expands one byte to two.
  • Base 64. Most common, not so verbose, expands three bytes to four.
  • Base 85. Not common, less verbose again, expands four bytes to five.

Are there any other encoding schemes in common use? If so, what are there advantages and disadvantages?

Edit: This is useful, for example, when trying to store arbitrary data in a cookie. Cookies can only store text, not arbitrary data, so you need to convert it in some way, preferably with a way to convert it back. Further, assume that you are using a stateless server so that you cannot save the state on the server and just put an identifier into the cookie. Of course, if you do this you would also need some way of verifying that what the user is passing back to you is what you passed to the user, for example a signature.

Also, since the current consensus is that you should use base64 since it is widespread, I will also point out that this is what I use... I am just curious if anyone used anything else, and if so, why.

Edit: Just in case someone stumbles across this, if you do want to use Base64 to store data in a cookie, you need to use a modified Base64 implementation. See this answer for the reason why.

like image 847
Paul Wagland Avatar asked Jan 18 '10 23:01

Paul Wagland


People also ask

What is the ASCII encoding of binary data?

ASCII encodes characters into seven bits of binary data. Since each bit can either be a 1 or a 0, that gives a total of 128 possible combinations. Each of these binary numbers can be converted to denary number from 0 through to 127. For example 1000001 in binary equals 65 in denary.

What is the encoding of a binary file?

Binary encoding is a procedure to convert data to a form that is easily used by difference computer operating systems. This achieved by converting binary data to a ASCII string format, specifically, converting 8-bit data into a 7-bit format, that use as standard set of ASCII printable characters.

Is UTF-8 binary data?

UTF-8 is used for text-to-binary encoding. UTF-8 cannot be used for binary-to-text encoding because not all possible BINARY values can be converted to valid UTF-8 strings.

Is binary a encoding scheme?

Binary encoding is a combination of Hash encoding and one-hot encoding. In this encoding scheme, the categorical feature is first converted into numerical using an ordinal encoder. Then the numbers are transformed in the binary number. After that binary value is split into different columns.


1 Answers

For encoding cookie values, you need to be careful. See this older answer:

With Version 0 cookies, values should not contain white space, brackets, parentheses, equals signs, commas, double quotes, slashes, question marks, at signs, colons, and semicolons. Empty values may not behave the same way on all browsers.

Base64 encoding can generate = symbols for certain inputs, and this technically is not permitted in cookies (version 0 cookies, anyway, which are the most widely supported). In practice, I suspect the = will actually work fine, but maybe not.

I would suggest that to be absolutely sure that your encoded binary is cookie-compatible, then basic hex encoding is safest (e.g. in java).

edit: As @Paul helpfully pointed out, there is a modified version of Base 64 that is "URL safe" (and, I assume, "cookie safe"). Using a modified version of a standard algorithm rather dilutes its charm, mind you.

edit: @shoosh pointed out that the = is only used to denote the end of the base64 string, so you could trim the =, set the cookie, then reattach the = again when you need to decode it.

like image 92
skaffman Avatar answered Sep 30 '22 18:09

skaffman