Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Store non-English string in std::string

I have a simple string in std::wstring

std::wstring tempStr = _T("F:\\Projects\\Current_자동_\\Cam.xml");

I want to store this string in a std::string.

I have tried the below code but the result is not the same as input string

std::wstring tempStr = _T("F:\\Projects\\Current_자동_\\Cam.xml");
//setup converter
typedef  std::codecvt_utf8_utf16 <wchar_t> convert_type;
std::wstring_convert<convert_type, wchar_t> converter;

//use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( tempStr );

The Korean string present in the input string is converted to "ìžë™".

Is there any way I can get the same string in std::string?

Expected result:

converted_str should contain F:\Projects\Current_자동_\Cam.xml

Below is an screenshot of debugging showing 3 values in 3 scenarios (conversion in 3 ways). But none of them gives the desired value.

Debugging image

like image 890
Narendra Avatar asked Mar 13 '14 10:03

Narendra


1 Answers

Your conversion code is fine.

In fact, in UTF-8 (the string you store in std::string), the characters 자동 corresponds to:

자 (UTF-16 0xC790) ---> UTF-8:  EC 9E 90
동 (UTF-16 0xB3D9) ---> UTF-8:  EB 8F 99

If you run the following program, which just prints the converted UTF-8 bytes, you get this output:

ec 9e 90 eb 8f 99

#include <iomanip>      // For std::hex
#include <iostream>     // For console output
#include <string>       // For STL strings
#include <codecvt>      // For Unicode conversions

void print_char_hex(const char ch)
{
    auto * p = reinterpret_cast<const unsigned char*>(&ch);
    int i = *p;
    std::cout << std::hex << i << ' ';
}

int main()
{
    std::wstring utf16_str = L"\xC790\xB3D9";

    // setup converter
    typedef  std::codecvt_utf8_utf16<wchar_t> convert_type;
    std::wstring_convert<convert_type, wchar_t> converter;

    // use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
    std::string converted_str = converter.to_bytes( utf16_str );

    // Output the converted bytes (UTF-8)
    for (size_t i = 0; i < converted_str.length(); ++i)
    {
        print_char_hex(converted_str[i]);
    }
    std::cout << std::endl;
}
like image 170
Mr.C64 Avatar answered Sep 18 '22 13:09

Mr.C64