Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using char16_t and char32_t in I/O

Tags:

C++11 introduces char16_t and char32_t to facilitate working with UTF-16- and UTF-32-encoded text strings. But the <iostream> library still only supports the implementation-defined wchar_t for multi-byte I/O.

Why has support for char16_t and char32_t not been added to the <iostream> library to complement the wchar_t support?

like image 891
oz1cz Avatar asked Nov 17 '11 14:11

oz1cz


People also ask

What is the range of char16_t?

If a U16 length modifier is present, the argument shall be a character of char16_t type. (The allowable range for UTF-16 characters is 0x0-0xFFFF, except the Surrogate Range of 0xD800-0xDFFF, inclusive).

What is char16_t in C?

char16_t is an unsigned integer type used for 16-bit wide characters and is the same type as uint_least16_t . uint_least16_t is the smallest unsigned integer type with width of at least 16 bits.

Where is char16_t defined?

char16_t and char32_t are specified in the C standard. (Citations below are from the 2018 standard.) Per clause 7.28, the header <uchar. h> declares them as unsigned integer types to be used for 16-bit and 32-bit characters, respectively.


1 Answers

In the proposal Minimal Unicode support for the standard library (revision 2) it is indicated that there was only support among the Library Working Group for supporting the new character types in strings and codecvt facets. Apparently the majority was opposed to supporing iostream, fstream, facets other than codecvt, and regex.

According to minutes from the Portland meeting in 2006 "the LWG is committed to full support of Unicode, but does not intend to duplicate the library with Unicode character variants of existing library facilities." I haven't found any details, however I would guess that the committee feels that the current library interface is inappropriate for Unicode. One possible complaint could be that it was designed with fixed sized characters in mind, but Unicode completely obsoletes that as, while Unicode data can use fixed sized code points, it does not limit characters to single code points.

Personally I think there's no reason not to standardized the minimal support that's already provided on various platforms (Windows uses UTF-16 for wchar_t, most Unix platforms use UTF-32). More advanced Unicode support will require new library facilities, but supporting char16_t and char32_t in iostreams and facets won't get in the way but would enable basic Unicode i/o.

like image 197
bames53 Avatar answered Sep 26 '22 07:09

bames53