Why do Base64.decode produce same byte array for different strings?

Question

I'm using URL safe Base64 encoding to encode my randomly generated byte arrays. But I have a problem on decoding. When I decode two different strings (all but the last chars are identical), it produces the same byte array. For example, for both "dGVzdCBzdHJpbmr" and "dGVzdCBzdHJpbmq" strings the result is same:

Array(116, 101, 115, 116, 32, 115, 116, 114, 105, 110, 106)

For encoding/decoding I use java.util.Base64 in that way:

// encoding...
Base64.getUrlEncoder().withoutPadding().encodeToString(myString.getBytes())
// decoding...
Base64.getUrlDecoder().decode(base64String)

What is the reason of this collision? Is it also possible with chars other than the last one? And how can I fix this and make decoding to return a different byte array for each different string?

Harald K · Accepted Answer

The issue you are seeing, is caused by the fact that the number of bytes you have in the "result" (11 bytes) doesn't completely "fill" the last char of the Base64 encoded string.

Remember that Base64 encodes each 8 bit entity into 6 bit chars. The resulting string then needs exactly 11 * 8 / 6 bytes, or 14 2/3 chars. But you can't write partial characters. Only the first 4 bits (or 2/3 of the last char) are significant. The last two bits are not decoded. Thus all of:

dGVzdCBzdHJpbmo
dGVzdCBzdHJpbmp
dGVzdCBzdHJpbmq
dGVzdCBzdHJpbmr

All decode to the same 11 bytes (116, 101, 115, 116, 32, 115, 116, 114, 105, 110, 106).

PS: Without padding, some decoders will try to decode the "last" byte as well, and you'll have a 12 byte result (with different last byte). This is the reason for my comment (asking if withoutPadding() option is a good idea). But your decoder seems to handle this.

Why do Base64.decode produce same byte array for different strings?

Tags:

java

encoding

base64

decoding

ovunccetin

1 Answers

Harald K

Recent Activity

Donate For Us

Why do Base64.decode produce same byte array for different strings?

Tags:

java

encoding

base64

decoding

ovunccetin

1 Answers

Harald K

Related questions

Recent Activity

Donate For Us