Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do base64/openssl use a padding character of 'K' instead of '='

Tags:

base64

Ive noticed that php base64_encode uses '=' as a padding character. According to Wikipedia the different types use either '=' or none. However the CLI base64 command as well as openssl enc -base64 use 'K' as the padding. I am looking for information as to why and what implementations they use.

echo base64_encode('hello'); // aGVsbG8=
echo hello | base64 -i - // aGVsbG8K
openssl enc -base64 <<< hello   // aGVsbG8K
like image 877
cyberwombat Avatar asked Aug 01 '17 20:08

cyberwombat


People also ask

Why does Base64 have padding?

With padding, a base64 string always has a length that is a multiple of 4 (if it doesn't, the string has been corrupted for sure) and thus code can easily process that string in a loop that processes 4 characters at a time (always converting 4 input characters to three or less output bytes).

What is the padding character in Base64?

The Base64 encoding uses 0-padding when encoding data. It is possible to hide information in this padding, as it is disregarded upon decoding. For efficiently hiding larger amounts multiple strings need to be encoded as one Base64-encoded string can contain 4, 2 or 0 bits of secret text.

Is Base64 padding necessary?

Since Base64 uses 24-bit sequences, padding is needed when the original binary cannot be divided into a 24-bit sequence.

What character pads the end of Base64 encoded strings?

Remember I said the "=" character has a special meaning? It's used to pad the end of a base64 encoded string. Remember that Base64 encodes 24 bits in chunks of 6 bits equaling 4 base64 characters. We get our last group of 24 bits by first adding zero bits to fill in the remaining bits of our last group of 6 bits.


1 Answers

K is not padding character. It is a result of the newline which is added by the shell commands.

echo hello | openssl enc -base64 # aGVsbG8K
echo -n hello | openssl enc -base64 # aGVsbG8=

UPDATE:

Technical explanation

Base64 converts the provided bitstream to 6-bit-chunks instead of 8-bit chunks. Then a special table (other than the ascii table) with 64 printable-only characters (thus the encoding name), is used to convert these 6-bit chunks to characters:

Let's see it in practice. (print-bits and print-b64-bits are imaginary commands )

With newline:

echo hello | print-bits

# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)

echo hello | print-b64-bits

# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8) 001010 (K)


No newline:

echo -n hello | print-bits

# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101111 (o)

echo -n hello | print-b64-bits

# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 111100 (8)

In the latter case the output characters are 7. A = char needs to be appended to make them 8 (a product of 4).

Note: A newline at the end is not always converted to K. It could be o or g. This depends on the number of input bytes. Consider the case below:

echo helllo | print-bits

# 01101000 (h) 01100101 (e) 01101100 (l) 01101100 (l) 01101100 (l) 01101111 (o) 00001010 (\n)

echo helllo | print-b64-bits

# 011010 (a) 000110 (G) 010101 (V) 101100 (s) 011011 (b) 000110 (G) 110001 (x) 101111 (v) 000010 (C) 10 (g)

In the case above the last 2 bits will first be padded with zeros, then conversion to printable characters will follow. The last output character is now g.

And since the output characters are 10, two = need to be added to make them 12 (product of 4).

like image 200
Marinos An Avatar answered Sep 19 '22 23:09

Marinos An