I've found a rather strange thing for me while working with Java. Maybe it's an ordinary thing, but i don't understand why it works this way.
I have a code like this:
Character x = 'B';
Object o = x;
System.out.println(o == 'B');
It works fine and the output is "true". Then I change the english B to slavic B (Б):
Character x = 'Б';
Object o = x;
System.out.println(o == 'Б');
Now the output is "false". How come? By the way, the output is still "true" if i compare the x variable with 'Б' directly, but when I do it through an Object it works differently.
Can anyone, please, explain this behaviour?
Without boxing - using just char
- you'd be fine. Likewise if you use equals
instead of ==
, you'd be fine. The problem is that you're comparing references for boxed values using ==
, which just checks for reference identity. You're seeing a difference because of the way auto-boxing works. You can see the same thing with Integer
:
Object x = 0;
Object y = 0;
System.out.println(x == y); // Guaranteed to be true
Object x = 10000;
Object y = 10000;
System.out.println(x == y); // *May* be true
Basically "small" values have cached boxed representations, whereas "larger" values may not.
From JLS 5.1.7:
If the value p being boxed is an integer literal of type
int
between -128 and 127 inclusive (§3.10.1), or the boolean literal true or false (§3.10.3), or a character literal between '\u0000' and '\u007f' inclusive (§3.10.4), then let a and b be the results of any two boxing conversions of p. It is always the case that a == b.Ideally, boxing a primitive value would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rule above is a pragmatic compromise, requiring that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly. For other values, the rule disallows any assumptions about the identity of the boxed values on the programmer's part. This allows (but does not require) sharing of some or all of these references. Notice that integer literals of type
long
are allowed, but not required, to be shared.This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all
char
andshort
values, as well asint
andlong
values in the range of -32K to +32K.
The part about "a character literal between \u0000 and
\u007f`" guarantees that boxed ASCII characters will be cached, but not non-ASCII boxed characters.
when you do
Character x = 'B'
it invokes Character.valueOf(C)
2: invokestatic #16 // Method java/lang/Character.valueOf:(C)Ljava/lang/Character;
which caches
This method will always cache values in the range '\u0000' to '\u007F', inclusive, and may cache other values outside of this range.
public static Character valueOf(char c) {
if(c <= 127) { // must cache
return CharacterCache.cache[(int)c];
}
return new Character(c);
}
Similar
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With