Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a base64 encoded string unique?

I can't find an answer to this. If I encode a string with Base64 will the encoded output be unique based on the string? I ask because I want to create a token which will contain user information so I need make sure the output will be unique depending on the information.

For example if I encode "UnqUserId:987654321 Timestamp:01/02/03" will this be unique so no matter what other userid I put it in there will never be a collision?

like image 717
user2924127 Avatar asked May 24 '15 22:05

user2924127


People also ask

What is a Base64 encoded string?

Base64 is a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding.

What happens when you Base64 encode text?

Base64 encoding encodes binary data as values that can only be interpreted as text in textual media, and is free of any special characters and/or control characters, so that the data will be preserved across textual media as well.

How do I know if a string is Base64 encoded?

In base64 encoding, the character set is [A-Z, a-z, 0-9, and + /] . If the rest length is less than 4, the string is padded with '=' characters. ^([A-Za-z0-9+/]{4})* means the string starts with 0 or more base64 groups.

What does a Base64 string look like?

The length of a Base64-encoded string is always a multiple of 4. Only these characters are used by the encryption: “A” to “Z”, “a” to “z”, “0” to “9”, “+” and “/” The end of a string can be padded up to two times using the “=”-character (this character is allowed in the end only)


1 Answers

Two years late, but here we go:

The short answer is yes, unique binary/hex values will always encode to a unique base64 encoded string.

BUT, multiple base64 encoded strings may represent a single binary/hex value.

This is because hex bytes are not aligned with base64 'digits'. A single hex byte is represented by 8 bits while a single base64 digit is represented by 6 bits. Therefore, any hex value that is not 6-bit aligned can have multiple base64 representations (though correctly implemented base64 encoders should encode to the same base64 representation).

An example of this misalignment is the hex value '0x433356c1'. This value is represented by 32-bits and base64 encodes into 'QzNWwQ=='. This 32-bit value, however, is not 6-bit aligned. So what happens? The base64 encoder pads four zero bits onto the end of the binary representation in this case to make the sequence 36-bits and consequently 6-bit aligned.

When decoding, the base64 decoder now has to decode into an 8-bit aligned value. It truncates the padded bits and decodes the first 32 bits into a hex value. For example, 'QzNWwc==' and 'QzNWwQ==' are different base64 encoded strings, but decode to the same hex value, 0x433356c1. If we look carefully, we notice that the first 32 bits are the same for both of these encoded strings:

'QzNWwc==': 010000 110011 001101 010110 110000 011100  'QzNWwQ==': 010000 110011 001101 010110 110000 010000 

The only difference is the last four bits, which are ignored. Keep in mind that no base64 encoder should ever generate 'QzNWwc==' or any other base64 value for 0x433356c1 other than 'QzNWwQ==' since added padding bytes should always be zeros.

In conclusion, it is safe to assume that a unique binary/hex value will always encode to a unique base64 representation using correctly implemented base64 encoders. A 'collision' will only occur during decoding if base64 strings are generated without zeroing padding/alignment bytes.

like image 123
seano Avatar answered Oct 02 '22 21:10

seano