If I convert a character to byte
and then back to char
, that character mysteriously disappears and becomes something else. How is this possible?
This is the code:
char a = 'È'; // line 1 byte b = (byte)a; // line 2 char c = (char)b; // line 3 System.out.println((char)c + " " + (int)c);
Until line 2 everything is fine:
In line 1 I could print "a" in the console and it would show "È".
In line 2 I could print "b" in the console and it would show -56, that is 200 because byte is signed. And 200 is "È". So it's still fine.
But what's wrong in line 3? "c" becomes something else and the program prints ? 65480
. That's something completely different.
What I should write in line 3 in order to get the correct result?
The main difference between a byte and char data type is that byte is used to store raw binary data while other is used to store characters or text data. You can store character literals into a char variable e.g. char a = 'a'; A character literal is enclosed in single quotes.
The char type takes 1 byte of memory (8 bits) and allows expressing in the binary notation 2^8=256 values. The char type can contain both positive and negative values. The range of values is from -128 to 127.
And, every char is made up of 2 bytes because Java internally uses UTF-16. For instance, if a String contains a word in the English language, the leading 8 bits will all be 0 for every char, as an ASCII character can be represented using a single byte.
Syntax: byte by = (byte) ch; Here, ch is the char variable to be converted into Byte. It tells the compiler to convert the char into its byte equivalent value.
A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform c = (char)b
the value you get is 2^16 - 56 or 65536 - 56.
Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8
using sign extension in a widening conversion. This in turn is then narrowed down to 0xFFC8
when casting to a char
, which translates to the positive number 65480
.
From the language specification:
5.1.4. Widening and Narrowing Primitive Conversion
First, the byte is converted to an int via widening primitive conversion (§5.1.2), and then the resulting int is converted to a char by narrowing primitive conversion (§5.1.3).
To get the right point use char c = (char) (b & 0xFF)
which first converts the byte value of b
to the positive integer 200
by using a mask, zeroing the top 24 bits after conversion: 0xFFFFFFC8
becomes 0x000000C8
or the positive number 200
in decimals.
Above is a direct explanation of what happens during conversion between the byte
, int
and char
primitive types.
If you want to encode/decode characters from bytes, use Charset
, CharsetEncoder
, CharsetDecoder
or one of the convenience methods such as new String(byte[] bytes, Charset charset)
or String#toBytes(Charset charset)
. You can get the character set (such as UTF-8 or Windows-1252) from StandardCharsets
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With