Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

clang: converting const char16_t* (UTF-16) to wstring (UCS-4)

I'm trying to convert UTF-16 encoded strings to UCS-4

If I understand correctly, C++11 provides this conversion through codecvt_utf16.

My code is something like:

#include <iostream>
#include <locale>
#include <memory>
#include <codecvt>
#include <string>

using namespace std;

int main()
{
    u16string s;

    s.push_back('h');
    s.push_back('e');
    s.push_back('l');
    s.push_back('l');
    s.push_back('o');

    wstring_convert<codecvt_utf16<wchar_t>, wchar_t> conv;
    wstring ws = conv.from_bytes(reinterpret_cast<const char*> (s.c_str()));

    wcout << ws << endl;

    return 0;
}

Note: the explicit push_backs to get around the fact that my version of clang (Xcode 4.2) doesn't have unicode string literals.

When the code is run, I get terminate exception. Am I doing something illegal here? I was thinking it should work because the const char* that I passed to wstring_convert is UTF-16 encoded, right? I have also considered endianness being the issue, but I have checked that it's not the case.

like image 390
ryaner Avatar asked Dec 28 '22 09:12

ryaner


1 Answers

Two errors:

1) from_bytes() overload that takes the single const char* expects a null-terminated byte string, but your very second byte is '\0'.

2) your system is likely little-endian, so you need to convert from UTF-16LE to UCS-4:

#include <iostream>
#include <locale>
#include <memory>
#include <codecvt>
#include <string>

using namespace std;

int main()
{
    u16string s;

    s.push_back('h');
    s.push_back('e');
    s.push_back('l');
    s.push_back('l');
    s.push_back('o');

    wstring_convert<codecvt_utf16<wchar_t, 0x10ffff, little_endian>,
                     wchar_t> conv;
    wstring ws = conv.from_bytes(
                     reinterpret_cast<const char*> (&s[0]),
                     reinterpret_cast<const char*> (&s[0] + s.size()));

    wcout << ws << endl;

    return 0;
}

Tested with Visual Studio 2010 SP1 on Windows and CLang++/libc++-svn on Linux.

like image 66
Cubbi Avatar answered Dec 30 '22 10:12

Cubbi