Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++11: Example of difference between ordinary string literal and UTF-8 string literal?

A string literal that does not begin with an encoding-prefix is an ordinary string literal, and is initialized with the given characters.

A string literal that begins with u8, such as u8"asdf", is a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8.

I don't understand the difference between an ordinary string literal and a UTF-8 string literal.

Can someone provide an example of a situation where they are different? (Cause different compiler output)

(I mean from the POV of the standard, not any particular implementation)

Each source character set member in a character literal or a string literal, as well as each escape sequence and universal-character-name in a character literal or a non-raw string literal, is converted to the corresponding member of the execution character set.

like image 505
Andrew Tomazos Avatar asked Feb 04 '13 02:02

Andrew Tomazos


People also ask

What is the difference between string literals and character literals give one example of each?

Character literals represents alphabets (both cases), numbers (0 to 9), special characters (@, ?, & etc.) and escape sequences like \n, \b etc. Whereas, the String literal represents objects of String class.

What is an example of a string literal?

A string literal is a sequence of zero or more characters enclosed within single quotation marks. The following are examples of string literals: 'Hello, world!' 'He said, "Take it or leave it."'

What is the difference between string and string literal in C?

C-strings are simply implemented as a char array which is terminated by a null character (aka 0 ). This last part of the definition is important: all C-strings are char arrays, but not all char arrays are c-strings. C-strings of this form are called “string literals“: const char * str = "This is a string literal.

What is the difference between string and string literal?

Definition. String literal in Java is a set of characters that is created by enclosing them inside a pair of double quotes. In contrast, String Object is a Java is a set of characters that is created using the new() operator. Thus, this explains the main difference between string literal and string object.


1 Answers

The C and C++ languages allow a huge amount of latitude in their implementations. C was written long before UTF-8 was "the way to encode text in single bytes": different systems had different text encodings.

So what the byte values are for a string in C and C++ are really up to the compiler. 'A' is whatever the compiler's chosen encoding is for the character A, which may not agree with UTF-8.

C++ has added the requirement that real UTF-8 string literals must be supported by compilers. The bit value of u8"A"[0] is fixed by the C++ standard through the UTF-8 standard, regardless of the preferred encoding of the platform the compiler is targeting.

Now, much as most platforms C++ targets use 2's complement integers, most compilers have character encodings that are mostly compatible with UTF-8. So for strings like "hello world", u8"hello world" will almost certainly be identical.

For a concrete example, from man gcc

-fexec-charset=charset

Set the execution character set, used for string and character constants. The default is UTF-8. charset can be any encoding supported by the system's iconv library routine.

-finput-charset=charset

Set the input character set, used for translation from the character set of the input file to the source character set used by GCC. If the locale does not specify, or GCC cannot get this information from the locale, the default is UTF-8. This can be overridden by either the locale or this command line option. Currently the command line option takes precedence if there's a conflict. charset can be any encoding supported by the system's iconv library routine.

is an example of being able to change the execution and input character sets of C/C++.

like image 134
Yakk - Adam Nevraumont Avatar answered Oct 05 '22 10:10

Yakk - Adam Nevraumont