Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Byte and char conversion in Java

If I convert a character to byte and then back to char, that character mysteriously disappears and becomes something else. How is this possible?

This is the code:

char a = 'È';       // line 1        byte b = (byte)a;   // line 2        char c = (char)b;   // line 3 System.out.println((char)c + " " + (int)c); 

Until line 2 everything is fine:

  • In line 1 I could print "a" in the console and it would show "È".

  • In line 2 I could print "b" in the console and it would show -56, that is 200 because byte is signed. And 200 is "È". So it's still fine.

But what's wrong in line 3? "c" becomes something else and the program prints ? 65480. That's something completely different.

What I should write in line 3 in order to get the correct result?

like image 293
user1883212 Avatar asked Jul 28 '13 20:07

user1883212


People also ask

What is the difference between byte and char in Java?

The main difference between a byte and char data type is that byte is used to store raw binary data while other is used to store characters or text data. You can store character literals into a char variable e.g. char a = 'a'; A character literal is enclosed in single quotes.

Is char 1 or 2 bytes?

The char type takes 1 byte of memory (8 bits) and allows expressing in the binary notation 2^8=256 values. The char type can contain both positive and negative values. The range of values is from -128 to 127.

Why does Java use 2 bytes for char?

And, every char is made up of 2 bytes because Java internally uses UTF-16. For instance, if a String contains a word in the English language, the leading 8 bits will all be 0 for every char, as an ASCII character can be represented using a single byte.

How do you turn a char into a byte?

Syntax: byte by = (byte) ch; Here, ch is the char variable to be converted into Byte. It tells the compiler to convert the char into its byte equivalent value.


1 Answers

A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform c = (char)b the value you get is 2^16 - 56 or 65536 - 56.

Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8 using sign extension in a widening conversion. This in turn is then narrowed down to 0xFFC8 when casting to a char, which translates to the positive number 65480.

From the language specification:

5.1.4. Widening and Narrowing Primitive Conversion

First, the byte is converted to an int via widening primitive conversion (§5.1.2), and then the resulting int is converted to a char by narrowing primitive conversion (§5.1.3).


To get the right point use char c = (char) (b & 0xFF) which first converts the byte value of b to the positive integer 200 by using a mask, zeroing the top 24 bits after conversion: 0xFFFFFFC8 becomes 0x000000C8 or the positive number 200 in decimals.


Above is a direct explanation of what happens during conversion between the byte, int and char primitive types.

If you want to encode/decode characters from bytes, use Charset, CharsetEncoder, CharsetDecoder or one of the convenience methods such as new String(byte[] bytes, Charset charset) or String#toBytes(Charset charset). You can get the character set (such as UTF-8 or Windows-1252) from StandardCharsets.

like image 190
Maarten Bodewes Avatar answered Sep 17 '22 15:09

Maarten Bodewes