I encountered this question in technical test for a job. Given the following code example: <pre class="prettyprint"><code>public class Manager { public static void main (String args[]) { System.out.println((int) (char) (byte) -2); } } </code></pre> It gives the output as 65534. This behavior shows for negative values only; 0 and positive numbers yield the same value, meaning the one entered in SOP. The byte cast here is insignificant; I have tried without it. So my question is: what exactly is going on here?

There are some prerequisites that we need to agree upon before you can understand what is happening here. With understanding the following bullet points, the rest is simple deduction: <ol> <li>All primitive types within the JVM are represented as a sequence of bits. The <code>int</code> type is represented by 32 bits, the <code>char</code> and <code>short</code> types by 16 bits and the <code>byte</code> type is represented by 8 bits.</li> <li> All JVM numbers are signed, where the <code>char</code> type is the only unsigned "number". When a number is signed, the highest bit is used to represent the sign of this number. For this highest bit, <code>0</code> represents a non-negative number (positive or zero) and <code>1</code> represents a negative number. Also, with signed numbers, a negative value is inverted (technically known as two's complement notation) to the incrementation order of positive numbers. For example, a positive <code>byte</code> value is represented in bits as follows: <pre class="prettyprint"><code>00 00 00 00 => (byte) 0 00 00 00 01 => (byte) 1 00 00 00 10 => (byte) 2 ... 01 11 11 11 => (byte) Byte.MAX_VALUE </code></pre> while the bit order for negative numbers is inverted: <pre class="prettyprint"><code>11 11 11 11 => (byte) -1 11 11 11 10 => (byte) -2 11 11 11 01 => (byte) -3 ... 10 00 00 00 => (byte) Byte.MIN_VALUE </code></pre> This inverted notation also explains why the negative range can host an additional number compared to the positive range where the latter includes the representation of the number <code>0</code>. Remember, all this is only a matter of interpreting a bit pattern. You can note negative numbers differently, but this inverted notation for negative numbers is quite handy because it allows for some rather fast transformations as we will be able to see in a small example later on. As mentioned, this does not apply for the <code>char</code> type. The <code>char</code> type represents a Unicode character with a non-negative "numeric range" of <code>0</code> to <code>65535</code>. Each of this number refers to a 16-bits Unicode value. </li> <li> When converting between the <code>int</code>, <code>byte</code>, <code>short</code>, <code>char</code> and <code>boolean</code> types the JVM needs to either add or truncate bits. If the target type is represented by more bits than the type from which it is converted, then the JVM simply fills the additional slots with the value of the highest bit of the given value (which represents the signature): <pre class="prettyprint"><code>| short | byte | | | 00 00 00 01 | => (byte) 1 | 00 00 00 00 | 00 00 00 01 | => (short) 1 </code></pre> Thanks to the inverted notation, this strategy also works for negative numbers: <pre class="prettyprint"><code>| short | byte | | | 11 11 11 11 | => (byte) -1 | 11 11 11 11 | 11 11 11 11 | => (short) -1 </code></pre> This way, the value's sign is retained. Without going into details of implementing this for a JVM, note that this model allows for a casting being performed by a cheap shift operation what is obviously advantageous. An exception from this rule is widening a <code>char</code> type which is, as we said before, unsigned. A conversion from a <code>char</code> is always applied by filling the additional bits with <code>0</code> because we said there is no sign and thus no need for an inverted notation. A conversion of a <code>char</code> to an <code>int</code> is therefore performed as: <pre class="prettyprint"><code>| int | char | byte | | | 11 11 11 11 | 11 11 11 11 | => (char) \uFFFF | 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 11 | => (int) 65535 </code></pre> When the original type has more bits than the target type, the additional bits are merely cut off. As long as the original value would have fit into the target value, this works fine, as for example for the following conversion of a <code>short</code> to a <code>byte</code>: <pre class="prettyprint"><code>| short | byte | | 00 00 00 00 | 00 00 00 01 | => (short) 1 | | 00 00 00 01 | => (byte) 1 | 11 11 11 11 | 11 11 11 11 | => (short) -1 | | 11 11 11 11 | => (byte) -1 </code></pre> However, if the value is too big or too small, this does not longer work: <pre class="prettyprint"><code>| short | byte | | 00 00 00 01 | 00 00 00 01 | => (short) 257 | | 00 00 00 01 | => (byte) 1 | 11 11 11 11 | 00 00 00 00 | => (short) -32512 | | 00 00 00 00 | => (byte) 0 </code></pre> This is why narrowing castings sometimes lead to strange results. You might wonder why narrowing is implemented this way. You could argue that it would be more intuitive if the JVM checked a number's range and would rather cast an incompatible number to the biggest representable value of the same sign. However, this would require branching what is a costly operation. This is specifically important, as this two's complement notation allows for cheap arithmetic operations. </li> </ol> With all this information, we can see what happens with the number <code>-2</code> in your example: <pre class="prettyprint"><code>| int | char | byte | | 11 11 11 11 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | => (int) -2 | | | 11 11 11 10 | => (byte) -2 | | 11 11 11 11 | 11 11 11 10 | => (char) \uFFFE | 00 00 00 00 00 00 00 00 | 11 11 11 11 | 11 11 11 10 | => (int) 65534 </code></pre> As you can see, the <code>byte</code> cast is redundant as the cast to the <code>char</code> would cut the same bits. All this is also specified by the JVMS, if you prefer a more formal definition of all these rules. One final remark: A type's bit size does not necessarily represent the amount of bits that are reserved by the JVM for representing this type in its memory. As a matter of fact, the JVM does not distinguish between <code>boolean</code>, <code>byte</code>, <code>short</code>, <code>char</code> and <code>int</code> types. All of them are represented by the same JVM-type where the virtual machine merely emulates these castings. On a method's operand stack (i.e. any variable within a method), all values of the named types consumes 32 bits. This is however not true for arrays and object fields which any JVM implementer can handle at will.

There are two important things to note here, <ol> <li>a char is unsigned, and cannot be negative</li> <li>casting a byte to a char first involves a hidden cast to an int as per the Java Language Spec.</li> </ol> Thus casting -2 to an int gives us 11111111111111111111111111111110. Notice how the two's complement value has been sign extended with a one; that only happens for negative values. When we then narrow it to a char, the int is truncated to <pre class="prettyprint"><code>1111111111111110 </code></pre> Finally, casting 1111111111111110 to an int is bit extended with zero, rather than a one because the value is now considered to be positive (because chars can only be positive). Thus widening the bits leaves the value unchanged, but unlike the negative value case unchanged in value. And that binary value when printed in decimal is 65534.

Why does '(int)(char)(byte)-2' produce 65534 in Java?

Tags:

java

casting

I encountered this question in technical test for a job. Given the following code example:

public class Manager {
    public static void main (String args[]) {
        System.out.println((int) (char) (byte) -2);
    }
}

It gives the output as 65534.

This behavior shows for negative values only; 0 and positive numbers yield the same value, meaning the one entered in SOP. The byte cast here is insignificant; I have tried without it.

So my question is: what exactly is going on here?

496

asked Jul 08 '14 15:07

mangoCar

2 Answers

There are some prerequisites that we need to agree upon before you can understand what is happening here. With understanding the following bullet points, the rest is simple deduction:

All primitive types within the JVM are represented as a sequence of bits. The int type is represented by 32 bits, the char and short types by 16 bits and the byte type is represented by 8 bits.
All JVM numbers are signed, where the char type is the only unsigned "number". When a number is signed, the highest bit is used to represent the sign of this number. For this highest bit, 0 represents a non-negative number (positive or zero) and 1 represents a negative number. Also, with signed numbers, a negative value is inverted (technically known as two's complement notation) to the incrementation order of positive numbers. For example, a positive byte value is represented in bits as follows:
```
00 00 00 00 => (byte) 0
00 00 00 01 => (byte) 1
00 00 00 10 => (byte) 2
...
01 11 11 11 => (byte) Byte.MAX_VALUE
```
while the bit order for negative numbers is inverted:
```
11 11 11 11 => (byte) -1
11 11 11 10 => (byte) -2
11 11 11 01 => (byte) -3
...
10 00 00 00 => (byte) Byte.MIN_VALUE
```
This inverted notation also explains why the negative range can host an additional number compared to the positive range where the latter includes the representation of the number 0. Remember, all this is only a matter of interpreting a bit pattern. You can note negative numbers differently, but this inverted notation for negative numbers is quite handy because it allows for some rather fast transformations as we will be able to see in a small example later on.

As mentioned, this does not apply for the char type. The char type represents a Unicode character with a non-negative "numeric range" of 0 to 65535. Each of this number refers to a 16-bits Unicode value.
When converting between the int, byte, short, char and boolean types the JVM needs to either add or truncate bits.

If the target type is represented by more bits than the type from which it is converted, then the JVM simply fills the additional slots with the value of the highest bit of the given value (which represents the signature):
```
|     short   |     byte    |
|             | 00 00 00 01 | => (byte) 1
| 00 00 00 00 | 00 00 00 01 | => (short) 1
```
Thanks to the inverted notation, this strategy also works for negative numbers:
```
|     short   |     byte    |
|             | 11 11 11 11 | => (byte) -1
| 11 11 11 11 | 11 11 11 11 | => (short) -1
```
This way, the value's sign is retained. Without going into details of implementing this for a JVM, note that this model allows for a casting being performed by a cheap shift operation what is obviously advantageous.

An exception from this rule is widening a char type which is, as we said before, unsigned. A conversion from a char is always applied by filling the additional bits with 0 because we said there is no sign and thus no need for an inverted notation. A conversion of a char to an int is therefore performed as:
```
|            int            |    char     |     byte    |
|                           | 11 11 11 11 | 11 11 11 11 | => (char) \uFFFF
| 00 00 00 00 | 00 00 00 00 | 11 11 11 11 | 11 11 11 11 | => (int) 65535
```
When the original type has more bits than the target type, the additional bits are merely cut off. As long as the original value would have fit into the target value, this works fine, as for example for the following conversion of a short to a byte:
```
|     short   |     byte    |
| 00 00 00 00 | 00 00 00 01 | => (short) 1
|             | 00 00 00 01 | => (byte) 1
| 11 11 11 11 | 11 11 11 11 | => (short) -1
|             | 11 11 11 11 | => (byte) -1
```
However, if the value is too big or too small, this does not longer work:
```
|     short   |     byte    |
| 00 00 00 01 | 00 00 00 01 | => (short) 257
|             | 00 00 00 01 | => (byte) 1
| 11 11 11 11 | 00 00 00 00 | => (short) -32512
|             | 00 00 00 00 | => (byte) 0
```
This is why narrowing castings sometimes lead to strange results. You might wonder why narrowing is implemented this way. You could argue that it would be more intuitive if the JVM checked a number's range and would rather cast an incompatible number to the biggest representable value of the same sign. However, this would require branching what is a costly operation. This is specifically important, as this two's complement notation allows for cheap arithmetic operations.

With all this information, we can see what happens with the number -2 in your example:

|           int           |    char     |     byte    |
| 11 11 11 11 11 11 11 11 | 11 11 11 11 | 11 11 11 10 | => (int) -2
|                         |             | 11 11 11 10 | => (byte) -2
|                         | 11 11 11 11 | 11 11 11 10 | => (char) \uFFFE
| 00 00 00 00 00 00 00 00 | 11 11 11 11 | 11 11 11 10 | => (int) 65534

As you can see, the byte cast is redundant as the cast to the char would cut the same bits.

All this is also specified by the JVMS, if you prefer a more formal definition of all these rules.

One final remark: A type's bit size does not necessarily represent the amount of bits that are reserved by the JVM for representing this type in its memory. As a matter of fact, the JVM does not distinguish between boolean, byte, short, char and int types. All of them are represented by the same JVM-type where the virtual machine merely emulates these castings. On a method's operand stack (i.e. any variable within a method), all values of the named types consumes 32 bits. This is however not true for arrays and object fields which any JVM implementer can handle at will.

answered Nov 17 '22 18:11

Rafael Winterhalter

There are two important things to note here,

a char is unsigned, and cannot be negative
casting a byte to a char first involves a hidden cast to an int as per the Java Language Spec.

Thus casting -2 to an int gives us 11111111111111111111111111111110. Notice how the two's complement value has been sign extended with a one; that only happens for negative values. When we then narrow it to a char, the int is truncated to

1111111111111110

Finally, casting 1111111111111110 to an int is bit extended with zero, rather than a one because the value is now considered to be positive (because chars can only be positive). Thus widening the bits leaves the value unchanged, but unlike the negative value case unchanged in value. And that binary value when printed in decimal is 65534.

answered Nov 17 '22 17:11

Chris K

Related questions
                            
                                Most efficient way to see if an ArrayList contains an object in Java
                            
                                How to run a maven created jar file using just the command line
                            
                                JSON Jackson parse different keys into same field
                            
                                Counting the number of files in a directory using Java
                            
                                Set Logging Level in Spring Boot via Environment Variable
                            
                                How is using OnClickListener interface different via XML and Java code? [duplicate]
                            
                                Which one to use, int or Integer
                            
                                How can I write a byte array to a file in Java?
                            
                                Change Git user in IntelliJ IDEA
                            
                                RestTemplate: How to send URL and query parameters together
                            
                                Java Generics: List, List<Object>, List<?>
                            
                                Android - how to replace part of a string by another string?
                            
                                How to locate the Path of the current project directory in Java (IDE)?
                            
                                How safe is it to remove the "-" in a randomly generated UUID?
                            
                                Android How to draw a smooth line following your finger
                            
                                How to find GCD, LCM on a set of numbers
                            
                                How to pick an item by its probability?
                            
                                Why does my sorting loop seem to append an element where it shouldn't?
                            
                                how to convert image to byte array in java? [duplicate]
                            
                                Setting up scala with IntelliJ

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With