Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Signedness of char and Unicode in C++0x

From the C++0x working draft, the new char types (char16_t and char32_t) for handling Unicode will be unsigned (uint_least16_t and uint_least32_t will be the underlying types).

But as far as I can see (not very far perhaps) a type char8_t (based on uint_least8_t) is not defined. Why ?

And it's even more confusing when you see that a new u8 encoding prefix is introduced for UTF-8 string literal... based on old friend (sign/unsigned) char. Why ?

Update : There's a proposal to add a new type : char8_t

char8_t: A type for UTF-8 characters and strings (Revision 1) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r1.html

like image 535
anno Avatar asked Mar 06 '10 03:03

anno


1 Answers

char will be the type used for UTF-8 because it's redefined to be sure it can be used with it:

For the purpose of enhancing support for Unicode in C++ compilers, the definition of the type char has been modified to be both at least the size necessary to store an eight-bit coding of UTF-8 and large enough to contain any member of the compiler's basic execution character set. It was previously defined as only the latter. There are three Unicode encodings that C++0x will support: UTF-8, UTF-16, and UTF-32. In addition to the previously noted changes to the definition of char, C++0x will add two new character types: char16_t and char32_t. These are designed to store UTF-16 and UTF-32 respectively.

Source : http://en.wikipedia.org/wiki/C%2B%2B0x

Most of UTF-8 application uses char already anyway on PC/mac.

like image 185
Klaim Avatar answered Nov 07 '22 09:11

Klaim