In Java, I learned that the following syntax can be used for mentioning Unicode characters that are not on the keyboard (eg. non-ASCII characters): <pre class="prettyprint"><code>(\u)(u)*(HexDigit)(HexDigit)(HexDigit)(HexDigit) </code></pre> My question is: What is the purpose of (u)* in the above syntax? One use case that I understood which represents Yen symbol in Java is: <pre class="prettyprint"><code>char ch = '\u00A5'; </code></pre>

It means you can add as many <code>u</code> as you want - for example these lines are equivalent: <pre class="prettyprint"><code>char ch = '\u00A5'; char ch = '\uuuuu00A5'; char ch = '\uuuuuuuuuuuuuuuuuu00A5'; </code></pre> (and all compile)

Unicode escape syntax in Java

Tags:

In Java, I learned that the following syntax can be used for mentioning Unicode characters that are not on the keyboard (eg. non-ASCII characters):

(\u)(u)*(HexDigit)(HexDigit)(HexDigit)(HexDigit)

My question is: What is the purpose of (u)* in the above syntax?

One use case that I understood which represents Yen symbol in Java is:

char ch = '\u00A5';

981

asked Feb 03 '14 08:02

user3265048

2 Answers

Interesting question. Section 3.3 of the JLS says:

UnicodeEscape:
    \ UnicodeMarker HexDigit HexDigit HexDigit HexDigit

UnicodeMarker:
    u
    UnicodeMarker u

which translates to \\u+\p{XDigit}{4}

and

If an eligible \ is followed by u, or more than one u, and the last u is not followed by four hexadecimal digits, then a compile-time error occurs.

So you're right, there can be one or more u after the backslash. The reason is given further down:

The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra u - for example, \uxxxx becomes \uuxxxx - while simultaneously converting non-ASCII characters in the source text to Unicode escapes containing a single u each.

This transformed version is equally acceptable to a Java compiler and represents the exact same program. The exact Unicode source can later be restored from this ASCII form by converting each escape sequence where multiple u's are present to a sequence of Unicode characters with one fewer u, while simultaneously converting each escape sequence with a single u to the corresponding single Unicode character.

So this input

 \u0020ä

becomes

 \uu0020\u00e4

The first uu means here "this was a unicode escape sequence to begin with" while the second u says "An automatic tool converted a non-ASCII character to a unicode escape."

This information is useful when you want to convert back from ASCII to unicode: You can restore as much of the original code as possible.

129

answered Nov 04 '22 21:11

Aaron Digulla

It means you can add as many u as you want - for example these lines are equivalent:

char ch = '\u00A5';
char ch = '\uuuuu00A5';
char ch = '\uuuuuuuuuuuuuuuuuu00A5';

(and all compile)

answered Nov 04 '22 22:11

assylias

Related questions
                            
                                Passport - Node.js not returning Refresh Token
                            
                                What's the difference between using self.attribute and attribute in a model?
                            
                                Sublime Text syntax highlight Jinja2 [closed]
                            
                                What does # (pound sign) mean in type signatures?
                            
                                Convert Float to String in Pandas
                            
                                How to integrate Paypal with Ruby on Rails
                            
                                Uncaught ReferenceError: mountNode is not defined
                            
                                VB.NET Stacking Select Case Statements together like in Switch C#/Java
                            
                                Unable to validate application archives of type: 0x0 in Xcode 6 when validating iOS app
                            
                                Windows.Web.Http.HttpClient#GetAsync throws an incomplete exception when invalid credentials are used with basic authentication
                            
                                How to edit a commit message in PyCharm?
                            
                                How to replicate the blurred text in Notification Center (iOS 8)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With