I am about to start working on something the requires reading bytes and creating strings. The bytes being read represent UTF-16 strings. So just to test things out I wanted to convert a simple byte array in UTF-16 encoding to a string. The first 2 bytes in the array must represent the endianness and so must be either 0xff 0xfe or 0xfe 0xff. So I tried creating my byte array as follows:
byte[] bytes = new byte[] {0xff, 0xfe, 0x52, 0x00, 0x6F, 0x00};
But I got an error because 0xFF and 0xFE are too big to fit into a byte (because bytes are signed in Java). More precisely the error was that the int couldn't be converted to a byte. I know that I could just explicitly convert from int to byte with a cast and achieve the desired result, but that is not what my question is about.
Just to try something out I created a String and called getBytes("UTF-16") then printed each of the bytes in the array. The output was slightly confusing because the first two bytes were 0xFFFFFFFE 0xFFFFFFFF, followed by 0x00 0x52 0x00 0x6F. (Obvisouly the endianness here is different from what I was trying to create above but that is not important).
Using this output I decided to try and create my byte array the same way:
byte[] bytes = new byte[] {0xffffffff, 0xfffffffe, 0x52, 0x00, 0x6F, 0x00};
And strangely enough it worked fine. So my question is, why does Java allow an integer value of 0xFFFFFF80 or greater to be automatically converted to a byte without an explicit cast, but anything equal to or greater than 0x80 requires an explicit cast?
The key thing to remember here is that int
in Java is a signed value. When you assign 0xffffffff
(which is 2^32 -1
), this is translated into a signed int of value -1
- an int
cannot actually represent something as large as 0xffffffff
as a positive number.
So for values less than 0x80 and greater than 0xFFFFFF80, the resulting int
value is between -128 and 127, which can unambiguously be represented as a byte
. Anything outside that range cannot be, and needs forcing with an explicit cast, losing data in the process.
If you use a number without a hint (e.g. 1234L for a long) the compiler assumes an integer. The value 0xffffffff
is an integer with value -1
which can be cast to byte
without a warning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With