Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert unicode code points to utf-8 in c++?

Tags:

c++

unicode

utf-8

I have an array consisting of unicode code points

unsigned short array[3]={0x20ac,0x20ab,0x20ac};

I just want this to be converted as utf-8 to write into file byte by byte using C++.

Example: 0x20ac should be converted to e2 82 ac.

or is there any other method that can directly write unicode characters in file.

like image 652
Venkatesan Avatar asked Dec 06 '13 08:12

Venkatesan


3 Answers

Finally! With C++11!

#include <string>
#include <locale>
#include <codecvt>
#include <cassert>

int main()
{
    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
    std::string u8str = converter.to_bytes(0x20ac);
    assert(u8str == "\xe2\x82\xac");
}
like image 175
sms Avatar answered Oct 19 '22 07:10

sms


Following code may help you,

#include <atlconv.h>
#include <atlstr.h>

#define ASSERT ATLASSERT

int main()
{
    const CStringW unicode1 = L"\x0391 and \x03A9"; // 'Alpha' and 'Omega'

    const CStringA utf8 = CW2A(unicode1, CP_UTF8);

    ASSERT(utf8.GetLength() > unicode1.GetLength());

    const CStringW unicode2 = CA2W(utf8, CP_UTF8);

    ASSERT(unicode1 == unicode2);
}
like image 45
Santosh Dhanawade Avatar answered Oct 19 '22 07:10

Santosh Dhanawade


The term Unicode refers to a standard for encoding and handling of text. This incorporates encodings like UTF-8, UTF-16, UTF-32, UCS-2, ...

I guess you are programming in a Windows environment, where Unicode typically refers to UTF-16.

When working with Unicode in C++, I would recommend the ICU library.

If you are programming on Windows, don't want to use an external library, and have no constraints regarding platform dependencies, you can use WideCharToMultiByte.

Example for ICU:

#include <iostream>
#include <unicode\ustream.h>

using icu::UnicodeString;

int main(int, char**) {
    //
    // Convert from UTF-16 to UTF-8
    //
    std::wstring utf16 = L"foobar";
    UnicodeString str(utf16.c_str());
    std::string utf8;
    str.toUTF8String(utf8);

    std::cout << utf8 << std::endl;
}

To do exactly what you want:

// Assuming you have ICU\include in your include path
// and ICU\lib(64) in your library path.
#include <iostream>
#include <fstream>
#include <unicode\ustream.h>
#pragma comment(lib, "icuio.lib")
#pragma comment(lib, "icuuc.lib")

void writeUtf16ToUtf8File(char const* fileName, wchar_t const* arr, size_t arrSize) {
    UnicodeString str(arr, arrSize);
    std::string utf8;
    str.toUTF8String(utf8);

    std::ofstream out(fileName, std::ofstream::binary);
    out << utf8;
    out.close();
}
like image 3
Max Truxa Avatar answered Oct 19 '22 06:10

Max Truxa