The new C++11 standard mentions a header <cuchar>
, presumably in analogy to C99's <uchar.h>
.
Now, we know that C++11 brings new character types and literals that are specifically designed for UTF16 and UTF32, but I didn't think the language would actually contain functions to convert the (system-dependent) narrow multibyte encoding to one of the Unicode encodings. However, I just came across the header synopsis for <cuchar>
that mentions functions mbrtoc16
/c16rtombr
and mbrtoc32
/c32rtombr
that seem to do just that.
Unfortunately, the standard says nothing about those functions beyond the header synopsis. Where are those functions defined, what do they really do and where can I read more about them? Does this mean that one can use proper Unicode entirely with standard C++ now, without the need for any extra libraries?
These were described in a WG21 paper from 2005 but the description is not present in the final standard. They are documented in ISO/IEC 19769:2004 (Extensions for the programming language C to support new character data types) (draft), which the C++11 standard refers to.
The text is too long to post here, but these are the signatures:
size_t mbrtoc16(char16_t * pc16, const char * s, size_t n, mbstate_t * ps);
size_t c16rtomb(char * s, char16_t c16, mbstate _t * ps);
size_t mbrtoc32(char32_t * pc32, const char * s, size_t n, mbstate_t * ps);
size_t c32rtomb(char * s, char32_t c32, mbstate_t * ps);
The functions convert between multibyte characters and UTF-16 or UTF-32 characters, respectively, similar to mbrtowc
. There are no non-reentrant versions, and honestly, who needs them?
Probably the best documentation of which I'm aware is in n1326, the proposal to add TR19769 to the C standard library [Edit: though looking at it, the N1010 that R. Martinho Fernandes cited seems to have pretty much the same].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With