Hi all I was browsing through some of the Java source code when I came across this (java.lang.Character
):
public static boolean isHighSurrogate(char ch) {
return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
}
public static boolean isLowSurrogate(char ch) {
return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1);
}
I was wondering why did the writer added 1 to the higher limit and doing a lesser-than compare, instead of simply doing a lesser-than-or-equal compare?
I can understand if it helps readability, but in this case it doesn't seem to be that case.
I was wondering what's the difference between the code above and this:
public static boolean isHighSurrogate(char ch) {
return ch >= MIN_HIGH_SURROGATE && ch <= MAX_HIGH_SURROGATE;
}
public static boolean isLowSurrogate(char ch) {
return ch >= MIN_LOW_SURROGATE && ch <= MAX_LOW_SURROGATE;
}
Perhaps the author is trying to be consistent with Dijkstra's advice to make all ranges half-open -- the start point is inclusive and the endpoint is exclusive.
There is no semantic difference here, but a subtle difference in bytecode: (ch + 1)
is an int
so the first code snippet does a char
to char
comparison followed by an int
to int
comparison while the second does two char
to char
comparisons. This does not lead to a semantic difference -- the implicit casts are to wider types and so there is no risk of overflow in either code snippet.
Optimizing out the addition and converting the int
to int
comparison back into a 2 byte unsigned int
comparison is well within the scope of the kinds of optimizations done by the JIT so I don't see any particular performance reason to prefer one over the other.
I tend to write this kind of thing as
MIN_LOW_SURROGATE <= ch && ch <= MAX_LOW_SURROGATE
that way the ch
in the middle makes it obvious to a reader that the ch
is being tested within the range of the outer values.
Wild guess
Surrogate character, any of a range of Unicode codepoints which are used in pairs in UTF-16 to represent characters beyond the Basic Multilingual Plane.
In my point of view he wanted to ignore 8 bit stuff, meaning if the max was 0xFF. the 0xFF+1 would overflow and go back to 0x00. Making the comparison always false.
So if the code was compiled with chars of 8 bits. It would always return false (outside of the UTF-16 range) while if it compiles a char in >8 bits the 0xFF+1 would be 0x100 and still work.
Hope this makes some sence for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With