From the C++0x working draft, the new char types (char16_t and char32_t) for handling Unicode will be unsigned (uint_least16_t and uint_least32_t will be the underlying types).
But as far as I can see (not very far perhaps) a type char8_t (based on uint_least8_t) is not defined. Why ?
And it's even more confusing when you see that a new u8 encoding prefix is introduced for UTF-8 string literal... based on old friend (sign/unsigned) char. Why ?
Update : There's a proposal to add a new type : char8_t
char8_t: A type for UTF-8 characters and strings (Revision 1) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r1.html
char will be the type used for UTF-8 because it's redefined to be sure it can be used with it:
For the purpose of enhancing support for Unicode in C++ compilers, the definition of the type char has been modified to be both at least the size necessary to store an eight-bit coding of UTF-8 and large enough to contain any member of the compiler's basic execution character set. It was previously defined as only the latter. There are three Unicode encodings that C++0x will support: UTF-8, UTF-16, and UTF-32. In addition to the previously noted changes to the definition of char, C++0x will add two new character types: char16_t and char32_t. These are designed to store UTF-16 and UTF-32 respectively.
Source : http://en.wikipedia.org/wiki/C%2B%2B0x
Most of UTF-8 application uses char already anyway on PC/mac.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With