In the CPP reference documentation,
I noticed for char
The character types are large enough to represent any UTF-8 eight-bit code unit (since C++14)
and for char8_t
type for UTF-8 character representation, required to be large enough to represent any UTF-8 code unit (8 bits)
Does that mean both are the same type? Or does char8_t
have some other feature?
Disclaimer: I'm the author of the char8_t
P0482 and P1423 proposals.
In C++20, char8_t
is a distinct type from all other types. In the related proposal for C, N2653, char8_t
is a typedef of unsigned char
similar to the existing typedefs for char16_t
and char32_t
.
In C++20, char8_t
has an underlying representation that matches unsigned char
. It therefore has the same size (at least 8-bit, but may be larger), alignment, and integer conversion rank as unsigned char
, but has different aliasing rules.
In particular, char8_t
was not added to the list of types at [basic.lval]p11. [basic.life]p6.4, [basic.types]p2, or [basic.types]p4. This means that, unlike unsigned char
, it cannot be used for the underlying storage of objects of another type, nor can it be used to examine the underlying representation of objects of other types; in other words, it cannot be used to alias other types. A consequence of this is that objects of type char8_t
can be accessed via pointers to char
or unsigned char
, but pointers to char8_t
cannot be used to access char
or unsigned char
data. In other words:
reinterpret_cast<const char *>(u8"text"); // Ok. reinterpret_cast<const char8_t*>("text"); // Undefined behavior.
The motivation for a distinct type with these properties is:
To provide a distinct type for UTF-8 character data vs character data with an encoding that is either locale dependent or that requires separate specification.
To enable overloading for ordinary string literals vs UTF-8 string literals (since they may have different encodings).
To ensure an unsigned type for UTF-8 data (whether char
is signed or unsigned is implementation defined).
To enable better performance via a non-aliasing type; optimizers can better optimize types that do not alias other types.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With