Unicode escape behavior in Java programs

Tags:

java

A few days ago, i was asked about this program's output:

public static void main(String[] args) {
    // \u0022 is the Unicode escape for double quote (")
    System.out.println("a\u0022.length() + \u0022b".length());
}

My first thought was this program should print the a\u0022.length() + \u0022b length, which is 16 but surprisingly, it printed 2. I know \u0022 is the unicode for " but i thought this " going to be escaped and only represent one " literal, with no special meaning. And in reality, Java somehow parsed this string as following:

System.out.println("a".length() + "b".length());

I can't wrap my head around this weird behavior, Why Unicode escapes don't behave as normal escape sequences?

Update Apparently, this was one of brain teasers of the Java Puzzlers: Traps, Pitfalls, and Corner Cases book written by Joshua Bloch and Neal Gafter. More specifically, the question was related to Puzzle 14: Escape Rout.

481

asked Mar 09 '16 19:03

Ali Dehghani

1 Answers

Why Unicode escapes doesn't behave as normal escape sequences?

Basically, they're processed at a different point in reading the input - in lexing rather than parsing, if I've got my terminology right. They're not escape sequences in character literals or string literals, they're escape sequences for the whole source file. Any character that's not part of a Unicode escape sequence can be replaced with the Unicode escape sequence. So you can write programs entirely in ASCII, which actually have variable, method and class names which are non-ASCII...

Fundamentally I believe this was a design mistake in Java, as it can cause some very weird effects (e.g. if you have the escape sequence for a line break within a // comment...) but it is what it is...

This is detailed in section 3.3 of the JLS:

A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its input, translating the ASCII characters \u followed by four hexadecimal digits to the UTF-16 code unit (§3.1) for the indicated hexadecimal value, and passing all other characters unchanged. Representing supplementary characters requires two consecutive Unicode escapes. This translation step results in a sequence of Unicode input characters.

...

The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example, \uxxxx becomes \uuxxxx - while simultaneously converting non-ASCII characters in the source text to Unicode escapes containing a single u each.

This transformed version is equally acceptable to a Java compiler and represents the exact same program. The exact Unicode source can later be restored from this ASCII form by converting each escape sequence where multiple u's are present to a sequence of Unicode characters with one fewer u, while simultaneously converting each escape sequence with a single u to the corresponding single Unicode character.

108

answered Oct 30 '22 17:10

Jon Skeet

Related questions
                            
                                Java stream collect arrays into one list
                            
                                FlatFileParseException Parsing error - Spring Batch
                            
                                Origin header value not allowed. Even though URL is allowed
                            
                                How to add back button on ActionBar in Android Studio?
                            
                                How to ask gson to avoid escaping json in a json response?
                            
                                Hibernate many-to-many remove relation
                            
                                Convert a byte array from Encoding A to Encoding B
                            
                                Grouping together JavaFX FXML Objects
                            
                                Encapsulation for mutable objects in Java
                            
                                modifying a public static final array
                            
                                Importing JSON library into IntelliJ IDEA
                            
                                How to insert data as fast as possible with Hibernate
                            
                                Why is flyway ignoring my SQL migration files?
                            
                                Which one is more efficient of using array list?
                            
                                When setting an array element to a String object, does the element reference the object?
                            
                                onNavigationItemSelected is not called
                            
                                Adding a timer to my program (javafx) [duplicate]
                            
                                Java's RAM usage doesn't correspond to what the Task Manager says
                            
                                Android - Layouts performance: Programmatic vs XML
                            
                                What is the difference between Array and Arrays class in Java? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With