If I convert a character to <code>byte</code> and then back to <code>char</code>, that character mysteriously disappears and becomes something else. How is this possible? This is the code: <pre class="prettyprint"><code>char a = 'È'; // line 1 byte b = (byte)a; // line 2 char c = (char)b; // line 3 System.out.println((char)c + " " + (int)c); </code></pre> Until line 2 everything is fine: <ul> <li>In line 1 I could print "a" in the console and it would show "È".</li> <li>In line 2 I could print "b" in the console and it would show -56, that is 200 because byte is signed. And 200 is "È". So it's still fine.</li> </ul> But what's wrong in line 3? "c" becomes something else and the program prints <code>? 65480</code>. That's something completely different. What I should write in line 3 in order to get the correct result?

A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform <code>c = (char)b</code> the value you get is 2^16 - 56 or 65536 - 56. Or more precisely, the byte is first converted to a signed integer with the value <code>0xFFFFFFC8</code> using sign extension in a widening conversion. This in turn is then narrowed down to <code>0xFFC8</code> when casting to a <code>char</code>, which translates to the positive number <code>65480</code>. From the language specification: 5.1.4. Widening and Narrowing Primitive Conversion <blockquote> First, the byte is converted to an int via widening primitive conversion (§5.1.2), and then the resulting int is converted to a char by narrowing primitive conversion (§5.1.3). </blockquote> <hr> To get the right point use <code>char c = (char) (b & 0xFF)</code> which first converts the byte value of <code>b</code> to the positive integer <code>200</code> by using a mask, zeroing the top 24 bits after conversion: <code>0xFFFFFFC8</code> becomes <code>0x000000C8</code> or the positive number <code>200</code> in decimals. <hr> Above is a direct explanation of what happens during conversion between the <code>byte</code>, <code>int</code> and <code>char</code> primitive types. If you want to encode/decode characters from bytes, use <code>Charset</code>, <code>CharsetEncoder</code>, <code>CharsetDecoder</code> or one of the convenience methods such as <code>new String(byte[] bytes, Charset charset)</code> or <code>String#toBytes(Charset charset)</code>. You can get the character set (such as UTF-8 or Windows-1252) from <code>StandardCharsets</code>.

Byte and char conversion in Java

Tags:

java

encoding

unicode

utf-16

If I convert a character to byte and then back to char, that character mysteriously disappears and becomes something else. How is this possible?

This is the code:

char a = 'È';       // line 1        byte b = (byte)a;   // line 2        char c = (char)b;   // line 3 System.out.println((char)c + " " + (int)c);

Until line 2 everything is fine:

In line 1 I could print "a" in the console and it would show "È".
In line 2 I could print "b" in the console and it would show -56, that is 200 because byte is signed. And 200 is "È". So it's still fine.

But what's wrong in line 3? "c" becomes something else and the program prints ? 65480. That's something completely different.

What I should write in line 3 in order to get the correct result?

293

asked Jul 28 '13 20:07

user1883212

1 Answers

A character in Java is a Unicode code-unit which is treated as an unsigned number. So if you perform c = (char)b the value you get is 2^16 - 56 or 65536 - 56.

Or more precisely, the byte is first converted to a signed integer with the value 0xFFFFFFC8 using sign extension in a widening conversion. This in turn is then narrowed down to 0xFFC8 when casting to a char, which translates to the positive number 65480.

From the language specification:

5.1.4. Widening and Narrowing Primitive Conversion

First, the byte is converted to an int via widening primitive conversion (§5.1.2), and then the resulting int is converted to a char by narrowing primitive conversion (§5.1.3).

To get the right point use char c = (char) (b & 0xFF) which first converts the byte value of b to the positive integer 200 by using a mask, zeroing the top 24 bits after conversion: 0xFFFFFFC8 becomes 0x000000C8 or the positive number 200 in decimals.

Above is a direct explanation of what happens during conversion between the byte, int and char primitive types.

If you want to encode/decode characters from bytes, use Charset, CharsetEncoder, CharsetDecoder or one of the convenience methods such as new String(byte[] bytes, Charset charset) or String#toBytes(Charset charset). You can get the character set (such as UTF-8 or Windows-1252) from StandardCharsets.

190

answered Sep 17 '22 15:09

Maarten Bodewes

Related questions
                            
                                How can I print an image on a Bluetooth printer in Android?
                            
                                Convert Java Number to BigDecimal : best way
                            
                                Instruction reordering & happens-before relationship in java [duplicate]
                            
                                Is .collect guaranteed to be ordered on parallel streams?
                            
                                Does assertEquals(Object o1, Object o2) uses the equals method
                            
                                Java Unicode String length
                            
                                Java XPath (Apache JAXP implementation) performance
                            
                                Javadoc in JDK 8 : Invalid "self-closing element not allowed"
                            
                                How to add a JAR in NetBeans
                            
                                How to refactor variable type in Eclipse?
                            
                                Cannot create JDBC driver of class ' ' for connect URL 'null' : I do not understand this exception
                            
                                Difference between a "jta-datasource" and a " resource-local " datasource?
                            
                                org.hibernate.NonUniqueResultException: query did not return a unique result: 2?
                            
                                How to indent the fluent interface pattern "correctly" with eclipse?
                            
                                Why composite-id class must implement Serializable?
                            
                                Parse a YAML file
                            
                                Eclipse Open declaration in Java project
                            
                                Is it ok to remove newline in Base64 encoding
                            
                                Password strength checking library [closed]
                            
                                Java: Retrieving an element from a HashSet

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With