I have been exploring C++11's new Unicode functionality, and while other C++11 encoding questions have been very helpful, I have a question about the following code snippet from cppreference. The code writes and then immediately reads a text file saved with UTF-8 encoding.
// Write
std::ofstream("text.txt") << u8"z\u6c34\U0001d10b";
// Read
std::wifstream file1("text.txt");
file1.imbue(std::locale("en_US.UTF8"));
std::cout << "Normal read from file (using default UTF-8/UTF-32 codecvt)\n";
for(wchar_t c; file1 >> c; ) // ?
std::cout << std::hex << std::showbase << c << '\n';
My question is quite simply, why is a wchar_t
needed in the for
loop? A u8
string literal can be declared using a simple char *
and the bit layout of the UTF-8 encoding should tell the system the character's width. It appears there is some automatic conversion from UTF-8 to UTF-32 (hence the wchar_t
), but if this is the case, why is the conversion necessary?
Most C string library routines still work with UTF-8, since they only scan for terminating NUL characters.
UTF-8 is a valid IANA character set name, whereas utf8 is not. It's not even a valid alias. it refers to an implementation-provided locale, where settings of language, territory, and codeset are implementation-defined.
You use wchar_t
because you're reading the file using wifstream
; if you were reading using ifstream
you'd use char
, and similarly for char16_t
and char32_t
.
Assuming (as the example does) that wchar_t
is 32-bit, and that the native character set that it represents is UTF-32 (UCS-4), then this is the simplest way to read a file as UTF-32; it is presented as such in the example for contrast to reading a file as UTF-16. A more portable method would be to use basic_ifstream<char32_t>
and std::codecvt_utf8<char32_t>
explicitly, as this is guaranteed to convert from a UTF-8 input stream to UTF-32 elements.
The idea of the cppreference code snippet you used is to show how to read a UTF-8 file into a UTF-16 string that's why they write the file using an ofstream but read it using a wifstream (hence the wchar_t).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With