Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any built-in function that convert wstring or wchar_t* to UTF-8 in Linux?

I want to convert wstring to UTF-8 Encoding, but I want to use built-in functions of Linux.

Is there any built-in function that convert wstring or wchar_t* to UTF-8 in Linux with simple invokation?

Example:

wstring str = L"file_name.txt";
wstring mode = "a";
fopen([FUNCTION](str), [FUNCTION](mode)); // Simple invoke.
cout << [FUNCTION](str); // Simple invoke.
like image 784
Amir Saniyan Avatar asked Sep 19 '11 10:09

Amir Saniyan


People also ask

Is wchar_t a UTF 16?

And wchar_t is utf-16 on Windows. So on Windows the conversion function can just do a memcpy :-) On everything else, the conversion is algorithmic, and pretty simple.

How do I assign a string to Wstring?

From char* to wstring : char* str = "hello worlddd"; wstring wstr (str, str+strlen(str)); From string to wstring : string str = "hello worlddd"; wstring wstr (str.

Should I use wchar_t?

No, you should not! The Unicode 4.0 standard (ISO 10646:2003) notes that: The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text.


2 Answers

The C++ language standard has no notion of explicit encodings. It only contains an opaque notion of a "system encoding", for which wchar_t is a "sufficiently large" type.

To convert from the opaque system encoding to an explicit external encoding, you must use an external library. The library of choice would be iconv() (from WCHAR_T to UTF-8), which is part of Posix and available on many platforms, although on Windows the WideCharToMultibyte functions is guaranteed to produce UTF8.

C++11 adds new UTF8 literals in the form of std::string s = u8"Hello World: \U0010FFFF";. Those are already in UTF8, but they cannot interface with the opaque wstring other than through the way I described.

See this question for a bit more background.

like image 171
Kerrek SB Avatar answered Sep 19 '22 00:09

Kerrek SB


If/when your compiler supports enough of C++11, you could use wstring_convert

#include <iostream>
#include <codecvt>
#include <locale>
int main()
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8_conv;
    std::wstring str = L"file_name.txt";
    std::cout << utf8_conv.to_bytes(str) << '\n';
}

tested with clang++ 2.9/libc++ on Linux and Visual Studio 2010 on Windows.

like image 35
Cubbi Avatar answered Sep 17 '22 00:09

Cubbi