I am wondering in c++, how can we support UTF8 encoding? I think c++ only support char and w_char, but I am wondering how to support UTF-8?
UTF-8 is supported just fine; UTF-8 uses eight-bit symbols to represent characters, with each character having one or more symbols. The standard guarantees that char will be at least eight bits, so every conforming C++ implementation can read, write and process UTF-8 characters. Since 7-bit ASCII is a strict subset of UTF-8, conversion between char strings and UTF-8 is also not a problem.
What is a problem is converting between other encodings (code pages such as Latin-1 or other Unicode encodings such as UTF-16, UCS-2, UTF-32 and UCS-4) and UTF-8. Here's a rough outline of the situation:
wchar_t type and allowed wide-string literals in the form L"XXX" but left most of the details implementation-defined. So VC++ treats wchar_t as 16-bit and encodes wide-string literals as UTF-16; GCC treats wchar_t as 32-bit and encodes wide-string literals as UTF-32.char16_t and char32_t, as well as 16- and 32-bit literals as u"XXX" and U"XXX". These, however, are not yet supported by VC++ (GCC has them).codecvt template. This was added in C++98 but support has been spotty, to say the least. Today, VC++ seems to have reasonable support but GCC's support is lacking.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With