Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

wcin.imbue and UTF-8

On linux with g++, if I set a utf8 global locale, then wcin correctly transcodes UTF-8 to the internal wchar_t encoding.

However, if I use the classic locale and imbue an UTF8 locale into wcin, this doesn't happen. Input either fails altogether, or each individual byte gets converted to wchar_t independently.

With clang++ and libc++, neither setting the global locale nor imbuing the locale in wcin work.

#include <iostream>
#include <locale>
#include <string>

using namespace std;

int main() {
    if(true)        
        // this works with g++, but not with clang++/libc++
        locale::global(locale("C.UTF-8"));
    else
        // this doesn't work with either implementation
        wcin.imbue(locale("C.UTF-8"));
    wstring s;
    wcin >> s;
    cout << s.length() << " " << (s == L"áéú");
    return 0;
}

The input stream contains only áéú characters. (They are in UTF-8, not any single-byte encoding).

Live demo: one two (I can't reproduce the other behaviour with online compilers).

Is this standard-conforming? Shouldn't I be able to leave the global locale alone and use imbue instead?

Should either of the described behaviours be classified as an implementation bug?

like image 266
n. 1.8e9-where's-my-share m. Avatar asked Sep 07 '15 12:09

n. 1.8e9-where's-my-share m.


1 Answers

First of all you should use wcout with wcin.

Now you have two possible solutions to that:

1) Deactivate synchronization of iostream and cstdio streams by using

   ios_base::sync_with_stdio(false);

Note, that this should be the first call, otherwise the behavior depends on implementation.

int main() {

   ios_base::sync_with_stdio(false);
   wcin.imbue(locale("C.UTF-8"));

   wstring s;
   wcin >> s;
   wcout << s.length() << " " << (s == L"áéú");
   return 0;
}

2) Localize both locale and wcout:

int main() {

   std::setlocale(LC_ALL, "C.UTF-8");
   wcout.imbue(locale("C.UTF-8"));

    wstring s;
    wcin >> s;
    wcout << s.length() << " " << (s == L"áéú");
    return 0;
}

Tested both of them using ideone, works fine. I don't have clang++/libc++ with me, so wasn't able to test this behavior, sorry.

like image 167
Roman Pustylnikov Avatar answered Nov 03 '22 16:11

Roman Pustylnikov