On linux with g++, if I set a utf8 global locale, then wcin
correctly transcodes UTF-8 to the internal wchar_t encoding.
However, if I use the classic locale and imbue an UTF8 locale into wcin, this doesn't happen. Input either fails altogether, or each individual byte gets converted to wchar_t independently.
With clang++ and libc++, neither setting the global locale nor imbuing the locale in wcin
work.
#include <iostream>
#include <locale>
#include <string>
using namespace std;
int main() {
if(true)
// this works with g++, but not with clang++/libc++
locale::global(locale("C.UTF-8"));
else
// this doesn't work with either implementation
wcin.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
cout << s.length() << " " << (s == L"áéú");
return 0;
}
The input stream contains only áéú characters. (They are in UTF-8, not any single-byte encoding).
Live demo: one two (I can't reproduce the other behaviour with online compilers).
Is this standard-conforming? Shouldn't I be able to leave the global locale alone and use imbue
instead?
Should either of the described behaviours be classified as an implementation bug?
First of all you should use wcout with wcin.
Now you have two possible solutions to that:
1) Deactivate synchronization of iostream and cstdio streams by using
ios_base::sync_with_stdio(false);
Note, that this should be the first call, otherwise the behavior depends on implementation.
int main() {
ios_base::sync_with_stdio(false);
wcin.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
wcout << s.length() << " " << (s == L"áéú");
return 0;
}
2) Localize both locale and wcout:
int main() {
std::setlocale(LC_ALL, "C.UTF-8");
wcout.imbue(locale("C.UTF-8"));
wstring s;
wcin >> s;
wcout << s.length() << " " << (s == L"áéú");
return 0;
}
Tested both of them using ideone, works fine. I don't have clang++/libc++ with me, so wasn't able to test this behavior, sorry.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With