I need to create a hash from a String containing users password. To create the hash, I use a byte array which I get by calling String.getBytes()
. But when I call this method with specified encoding, (such as UTF-8) on a platform where this is not the default encoding, the non-ASCII characters get replaced by a default character (if I understand the behaviour of getBytes() correctly) and therefore on such platform, I will get a different byte array, and eventually a different hash.
Since Strings are internally stored in UTF-16, will calling String.getBytes("UTF-16")
guarantee me that I get the same byte array on every platform, regardless of its default encoding?
Yes. Not only is it guaranteed to be UTF-16, but the byte order is defined too:
When decoding, the UTF-16 charset interprets the byte-order mark at the beginning of the input stream to indicate the byte-order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.
(The BOM isn't relevant when the caller doesn't ask for it, so String.getBytes(...)
won't include it.)
So long as you have the same string content - i.e. the same sequence of char
values - then you'll get the same bytes on every implementation of Java, barring bugs. (Any such bug would be pretty surprising, given that UTF-16 is probably the simplest encoding to implement in Java...)
The fact that UTF-16 is the native representation for char
(and usually for String
) is only relevant in terms of ease of implementation, however. For example, I'd also expect String.getBytes("UTF-8")
to give the same results on every platform.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With