Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do some character literals cause Syntax Errors in Java?

In the latest edition of JavaSpecialists newsletter, the author mentions a piece of code that is un-compilable in Java

public class A1 {
  Character aChar = '\u000d';
}

Try compile it, and you will get an error, such as:

A1.java:2: illegal line end in character literal
              Character aChar = '\u000d';
                                ^

Why an equivalent piece of c# code does not show such a problem?

public class CharacterFixture
{
  char aChar = '\u000d';
}

Am I missing anything?

EDIT: My original intention of question was how c# compiler got unicode file parsing correct (if so) and why java should still stick with the incorrect(if so) parsing? EDIT: Also i want myoriginal question title to be restored? Why such a heavy editing and i strongly suspect that it heavily modified my intentions.

like image 374
suhair Avatar asked Oct 29 '12 06:10

suhair


1 Answers

Java's compiler translates \uxxxx escape sequences as one of the very first steps, even before the tokenizer gets a crack at the code. By the time it actually starts tokenizing, there are no \uxxxx sequences anymore; they're already turned into the chars they represent, so to the compiler your Java example looks the same as if you'd actually typed a carriage return in there somehow. It does this in order to provide a way to use Unicode within the source, regardless of the source file's encoding. Even ASCII text can still fully represent Unicode chars if necessary (at the cost of readability), and since it's done so early, you can have them almost anywhere in the code. (You could say \u0063\u006c\u0061\u0073\u0073\u0020\u0053\u0074\u0075\u0066\u0066\u0020\u007b\u007d, and the compiler would read it as class Stuff {}, if you wanted to be annoying or torture yourself.)

C# doesn't do that. \uxxxx is translated later, with the rest of the program, and is only valid in certain types of tokens (namely, identifiers and string/char literals). This means it can't be used in certain places where it can be used in Java. cl\u0061ss is not a keyword, for example.

like image 141
cHao Avatar answered Oct 13 '22 00:10

cHao