Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can some ASCII characters not be expressed in the form '\uXXXX' in Java source code?

Tags:

java

I stumbled over this (again) today:

class Test {     char ok = '\n';     char okAsWell = '\u000B';     char error = '\u000A'; } 

It does not compile:

Invalid character constant in line 4.

The compiler seems to insist that I write '\n' instead. I see no reason for this, yet it's very annoying.

Is there a logical explanation why characters that have a special notation (like \t, \n, \r) must be expressed in that form in Java source?

like image 880
Durandal Avatar asked Mar 07 '13 16:03

Durandal


People also ask

Can we use ASCII in Java?

Java uses a multibyte encoding of Unicode characters. The Unicode character set is a super set of ASCII. So there can be characters in a Java string that do not belong to ASCII.

Does Java uses the ASCII character set to represent character data?

Java uses the ASCII character set to represent character data. The type of result produced by a mathematical expression depends on the types of the operands. Promotion is a widening data conversion that is explicitly requested by the programmer.

What is ASCII character set in Java?

ASCII is a 7-bit character set having 128 characters, i.e., from 0 to 127. ASCII represents a numeric value for each character, such as 65 is a value of A. In our Java program, we need to manipulate characters that are stored in ASCII. In Java, an ASCII table is a table that defines ASCII values for each character.


1 Answers

Unicode characters are replaced by their value, so your line is replaced by the compiler with:

char error = ' '; 

which is not a valid Java statement.

This is dictated by the Language Specification:

A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) of the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.

This can lead to surprising stuff, for example, this is a valid Java program (it contains hidden unicode characters) - courtesy of Peter Lawrey:

public static void main(String[] args) {     for (char c‮h = 0; c‮h < Character.MAX_VALUE; c‮h++) {         if (Character.isJavaIdentifierPart(c‮h) && !Character.isJavaIdentifierStart(c‮h)) {             System.out.printf("%04x <%s>%n", (int) c‮h, "" + c‮h);         }     } } 
like image 185
assylias Avatar answered Sep 28 '22 01:09

assylias