C++11 introduces char16_t
and char32_t
to facilitate working with UTF-16- and UTF-32-encoded text strings. But the <iostream>
library still only supports the implementation-defined wchar_t
for multi-byte I/O.
Why has support for char16_t
and char32_t
not been added to the <iostream>
library to complement the wchar_t
support?
If a U16 length modifier is present, the argument shall be a character of char16_t type. (The allowable range for UTF-16 characters is 0x0-0xFFFF, except the Surrogate Range of 0xD800-0xDFFF, inclusive).
char16_t is an unsigned integer type used for 16-bit wide characters and is the same type as uint_least16_t . uint_least16_t is the smallest unsigned integer type with width of at least 16 bits.
char16_t and char32_t are specified in the C standard. (Citations below are from the 2018 standard.) Per clause 7.28, the header <uchar. h> declares them as unsigned integer types to be used for 16-bit and 32-bit characters, respectively.
In the proposal Minimal Unicode support for the standard library (revision 2) it is indicated that there was only support among the Library Working Group for supporting the new character types in strings and codecvt facets. Apparently the majority was opposed to supporing iostream, fstream, facets other than codecvt, and regex.
According to minutes from the Portland meeting in 2006 "the LWG is committed to full support of Unicode, but does not intend to duplicate the library with Unicode character variants of existing library facilities." I haven't found any details, however I would guess that the committee feels that the current library interface is inappropriate for Unicode. One possible complaint could be that it was designed with fixed sized characters in mind, but Unicode completely obsoletes that as, while Unicode data can use fixed sized code points, it does not limit characters to single code points.
Personally I think there's no reason not to standardized the minimal support that's already provided on various platforms (Windows uses UTF-16 for wchar_t, most Unix platforms use UTF-32). More advanced Unicode support will require new library facilities, but supporting char16_t and char32_t in iostreams and facets won't get in the way but would enable basic Unicode i/o.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With