Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between using \u and \x while representing character literal

Tags:

I have seen \u and \x used interchangeably in some places while representing a character literal.

For example '\u00A9' == '\x00A9' evaluates to true

Aren't we supposed to use only \u to represent unicode character? What is the use of having two ways to represent a character?

like image 373
Arnab Avatar asked Aug 24 '15 06:08

Arnab


People also ask

What is a character literal in C++?

A character literal is a type of literal in programming for the representation of a single character's value within the source code of a computer program. In C++, A character literal is composed of a constant character. It is represented by the character surrounded by single quotation marks.

How do you represent a character in C#?

To represent a char value in C#, you must enclose it in single quotes: ' s ' . Here is an example of how to create a char variable." And with that, Ritchie returned to the whiteboard. Code!

Which of the following is are valid character literal?

Character literals are enclosed in single quotation marks. Any printable character, other than a backslash (\), can be specified as the single character itself enclosed in single quotes. Some examples of these literals are 'a', 'A', '9', '+', '_', and '~'.

Is Hex a Unicode?

Unicode numbersUnicode characters are distinguished by code points, which are conventionally represented by "U+" followed by four, five or six hexadecimal digits, for example U+00AE or U+1D310.


1 Answers

I would strongly recommend only using \u, as it's much less error-prone.

\x consumes 1-4 characters, so long as they're hex digits - whereas \u must always be followed by 4 hex digits. From the C# 5 specification, section 2.4.4.4, the grammar for \x:

hexadecimal-escape-sequence:
  \x hex-digit hex-digitopthex-digitopthex-digitopt

So for example:

string good = "Tab\x9Good compiler"; string bad =  "Tab\x9Bad compiler"; 

... look similar but are very different strings, as the latter is effectively "Tab" followed by U+9BAD followed by " compiler".

Personally I wish the C# language had never included \x, but there we go.

Note that there's also \U, which is always followed by 8 hex digits, primarily used for non-BMP characters.

There's one other big difference between \u and \x: the latter is only used in character and string literals, whereas \u can also be used in identifiers:

string x = "just a normal string"; Console.WriteLine(\u0078); // Still refers to the identifier x 
like image 178
Jon Skeet Avatar answered Sep 19 '22 12:09

Jon Skeet