I stumbled over this (again) today:
class Test { char ok = '\n'; char okAsWell = '\u000B'; char error = '\u000A'; }
It does not compile:
Invalid character constant in line 4.
The compiler seems to insist that I write '\n' instead. I see no reason for this, yet it's very annoying.
Is there a logical explanation why characters that have a special notation (like \t
, \n
, \r
) must be expressed in that form in Java source?
Java uses a multibyte encoding of Unicode characters. The Unicode character set is a super set of ASCII. So there can be characters in a Java string that do not belong to ASCII.
Java uses the ASCII character set to represent character data. The type of result produced by a mathematical expression depends on the types of the operands. Promotion is a widening data conversion that is explicitly requested by the programmer.
ASCII is a 7-bit character set having 128 characters, i.e., from 0 to 127. ASCII represents a numeric value for each character, such as 65 is a value of A. In our Java program, we need to manipulate characters that are stored in ASCII. In Java, an ASCII table is a table that defines ASCII values for each character.
Unicode characters are replaced by their value, so your line is replaced by the compiler with:
char error = ' ';
which is not a valid Java statement.
This is dictated by the Language Specification:
A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) of the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.
This can lead to surprising stuff, for example, this is a valid Java program (it contains hidden unicode characters) - courtesy of Peter Lawrey:
public static void main(String[] args) { for (char ch = 0; ch < Character.MAX_VALUE; ch++) { if (Character.isJavaIdentifierPart(ch) && !Character.isJavaIdentifierStart(ch)) { System.out.printf("%04x <%s>%n", (int) ch, "" + ch); } } }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With