NB: I'm sure someone will call this subjective, but I reckon it's fairly tangible.
C++11 gives us new basic_string
types std::u16string
and std::u32string
, type aliases for std::basic_string<char16_t>
and std::basic_string<char32_t>
, respectively.
The use of the substrings "u16"
and "u32"
to me in this context rather implies "UTF-16" and "UTF-32", which would be silly since C++ of course has no concept of text encodings.
The names in fact reflect the character types char16_t
and char32_t
, but these seem misnamed. They are unsigned, due to the unsignedness of their underlying types:
[C++11: 3.9.1/5]:
[..] Typeschar16_t
andchar32_t
denote distinct types with the same size, signedness, and alignment asuint_least16_t
anduint_least32_t
, respectively [..]
But then it seems to me that these names violate the convention that such unsigned types have names beginning 'u'
, and that the use of numbers like 16
unqualified by terms like least
indicate fixed-width types.
My question, then, is this: am I imagining things, or are these names fundamentally flawed?
The naming convention to which you refer (uint32_t
, int_fast32_t
, etc.) is actually only used for typedefs, and not for primitive types. The primitive integer types are {signed, unsigned} {char, short, int, long, long long}, {as opposed to float or decimal types} ...
However, in addition to those integer types, there are four distinct, unique, fundamental types, char
, wchar_t
, char16_t
and char32_t
, which are the types of the respective literals ''
, L''
, u''
and U''
and are used for alpha-numeric type data, and similarly for arrays of those. Those types are of course also integer types, and thus they will have the same layout at some of the arithmetic integer types, but the language makes a very clear distinction between the former, arithmetic types (which you would use for computations) and the latter "character" types which form the basic unit of some type of I/O data.
(I've previously rambled about those new types here and here.)
So, I think that char16_t
and char32_t
are actually very aptly named to reflect the fact that they belong to the "char" family of integer types.
are these names fundamentally flawed?
(I think most of this question has been answered in the comments, but to make an answer) No, not at all. char16_t
and char32_t
were created for a specific purpose. To have data type support for all Unicode encoding formats (UTF-8 is covered by char
) while keeping them as generic as possible to not limit them to only Unicode. Whether they are unsigned or have a fixed-width is not directly related to what they are: character data types. Types which hold and represent characters. Signedness is a property of data types that represent numbers not characters. The types are meant to store characters, either a 16 bit or 32 bit based character data, nothing more or less.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With