I have some text files which are encoded using UTF-8. Is there a way to read them using c++ stream classes (wifstream for example)?
I have seen some external references like boost and some codeproject code snippets. But, I dont want to use that just for this purpose.
On linux it somehow works by calling imbue(std::locale("en_US")) but not on windows. I think the problem is that window assumes wifstream to be a UTF-16 encoded stream. Can't I specify the unicode encoding with wifstream class somehow so that it uses UTF-8 not UTF-16?
In addition to just reading the bytes from the file normally, and treating them as UTF-8 (e.g., by not passing them to anything that expects locale encoded strings, only to things that expect UTF-8), Windows has another way to read in UTF-8.
You can set a 'UTF-8' mode on file descriptors, and then use wide character input and output on that file descriptor and Microsoft's C runtime will handle transforming the wide characters to and from UTF-8 encoded byte streams:
#include <fcntl.h>
#include <io.h>
#include <stdio.h>
int main(void) {
_setmode(_fileno(stdout), _O_U8TEXT);
wprintf(L"\x043a\x043e\x0448\x043a\x0430 \x65e5\x672c\x56fd\n");
}
If you run the above program with output redirected to a file you will get a UTF-8 encoded file.
Setting one of these Unicode modes on a file descriptor has the additional effect on consoles that wide character output will actually work on the console. I'm not sure why exactly Microsoft chose "broken" as the default, but at least there's a way to enable a "not broken" mode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With