Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't I use \u000D and \u000A as CR and LF in Java?

Tags:

java

unicode

Why can't I use \u000D and \u000A as CR and LF in Java? It's giving an error when I compile the code:

String x = "\u000A hello";//Error - Illegal escape character in string literal. 
like image 590
sadananda salam Avatar asked Oct 05 '10 17:10

sadananda salam


People also ask

What is \u000d in Java?

\u000d represents a newline character in unicode. Java compiler, just before the actual compilation strips out all the unicode characters and coverts it to character form. This parsing is done for the complete source code which includes the comments also.

What is u000d u000a?

\u000d\u000a has no special meaning in XML, but it might mean something to the sending or receiving application. As far as XML is concerned it's just a sequence of 12 ordinary characters, but to the application it might be a representation of the two characters CR LF, used to represent a line ending on Windows systems.


2 Answers

Unicode escapes are pre-processed before the compiler is run. Therefore, if you put \u000A in a String literal like this:

String someString = "foo\u000Abar"; 

It will be compiled exactly as if you wrote:

String someString = "foo bar"; 

Stick to \r (carriage return; 0x0D) and \n (line feed; 0x0A)

Bonus: You can always have fun with this, especially given the limitations on most syntax highlighters. Next time you've got a sec, try running this code:

public class FalseIsTrue {     public static void main(String[] args) {         if ( false == true ) { //these characters are magic: \u000a\u007d\u007b             System.out.println("false is true!");         }     } } 
like image 186
Mark Peters Avatar answered Sep 29 '22 18:09

Mark Peters


Because it falls within the range of Unicode Control characters

Which is U+0000–U+001F and U+007F.

Unicode control characters are used to control the interpretation or display of text, but these characters themselves have no visual or spatial representation.

They can be escaped by using \ like described in above answer by @Mark

FROM RFC:

2.5. Strings

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Any character may be escaped.

like image 32
Matas Vaitkevicius Avatar answered Sep 29 '22 19:09

Matas Vaitkevicius