Ive noticed that php base64_encode
uses '=' as a padding character. According to Wikipedia the different types use either '=' or none. However the CLI base64
command as well as openssl enc -base64
use 'K' as the padding. I am looking for information as to why and what implementations they use.
echo base64_encode('hello'); // aGVsbG8=
echo hello | base64 -i - // aGVsbG8K
openssl enc -base64 <<< hello // aGVsbG8K
With padding, a base64 string always has a length that is a multiple of 4 (if it doesn't, the string has been corrupted for sure) and thus code can easily process that string in a loop that processes 4 characters at a time (always converting 4 input characters to three or less output bytes).
The Base64 encoding uses 0-padding when encoding data. It is possible to hide information in this padding, as it is disregarded upon decoding. For efficiently hiding larger amounts multiple strings need to be encoded as one Base64-encoded string can contain 4, 2 or 0 bits of secret text.
Since Base64 uses 24-bit sequences, padding is needed when the original binary cannot be divided into a 24-bit sequence.
Remember I said the "=" character has a special meaning? It's used to pad the end of a base64 encoded string. Remember that Base64 encodes 24 bits in chunks of 6 bits equaling 4 base64 characters. We get our last group of 24 bits by first adding zero bits to fill in the remaining bits of our last group of 6 bits.
K
is not padding character. It is a result of the newline which is added by the shell commands.
echo hello | openssl enc -base64 # aGVsbG8K
echo -n hello | openssl enc -base64 # aGVsbG8=
UPDATE:
Base64 converts the provided bitstream to 6-bit-chunks instead of 8-bit chunks. Then a special table (other than the ascii table) with 64 printable-only characters (thus the encoding name), is used to convert these 6-bit chunks to characters:
Let's see it in practice. (print-bits
and print-b64-bits
are imaginary commands )
With newline:
echo hello | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)
echo hello | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8) 001010 (K)
No newline:
echo -n hello | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o)
echo -n hello | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8)
In the latter case the output characters are 7. A =
char needs to be appended to make them 8 (a product of 4).
Note: A newline at the end is not always converted to
K
. It could beo
org
. This depends on the number of input bytes. Consider the case below:
echo helllo | print-bits
# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)
echo helllo | print-b64-bits
# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 110001 (x) 101111 (v) 000010 (C) 10 (g)
In the case above the last 2 bits will first be padded with zeros, then conversion to printable characters will follow. The last output character is now g
.
And since the output characters are 10, two =
need to be added to make them 12 (product of 4).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With