Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing Unicode to a file in C++

I have a problem with writing unicode to a file in C++. I want to write to a file with my own extension a few smiley faces that you can get by typing ALT+NUMPAD(2). I can display it on CMD by making a char and assigning the value of '\2' to it and it will display a smiley face, but it won't write it to a file.

Here is a snippet of code for my program:

ofstream myfile;
myfile.open("C:\Users\My Username\test.exampleCodeFile");
myfile << "\2";
myfile.close();

It will write to the file, but it wont display what I want. I would show you what it displays but StackOverflow won't let me display the character. Thanks in advance.

like image 658
Garrett Ratliff Avatar asked Apr 09 '13 19:04

Garrett Ratliff


People also ask

Can C use Unicode?

It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII.

Is UTF-8 Unicode?

UTF-8 is a Unicode character encoding method. This means that UTF-8 takes the code point for a given Unicode character and translates it into a string of binary. It also does the reverse, reading in binary digits and converting them back to characters.

Does C++ string support Unicode?

C++ provides a wide-character type, wchar_t , which can store Unicode strings. The exact implementation of wchar_t is implementation defined, but it is often UTF-32. The class wstring , defined in <string> , is a sequence of wchar_t s, just like the string class is a sequence of char s.


1 Answers

You have to use Unicode to specify the characters you want to display. The character represented by byte 02h in the console is translated by code page 437 (cp437) to the Unicode character U+263B. Using a source file saved in UTF-8 with BOM makes using Unicode easier, because you can paste or type the characters you want without resorting to Unicode escape codes.

For a file stream the stream needs to be configured for UTF-8. There are various ways to do this and it depends on the compiler, but using Visual Studio 2012, source saved in UTF-8 w/ BOM, and a bit of Googling:

#include <locale>
#include <codecvt>
#include <fstream>
#include <iostream>
#include <io.h>
#include <fcntl.h>
using namespace std;

int main()
{
    const std::locale utf8_locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
    wofstream f(L"sample.txt");
    f.imbue(utf8_locale);
    f << L"\u263b我是美国人。我叫马克。" << endl;

    _setmode(_fileno(stdout),_O_U16TEXT);
    wcout << L"\u263b我是美国人。我叫马克。" << endl;
}

Content of sample.txt as viewed in Notepad:

☻我是美国人。我叫马克。

Hex dump (correct UTF-8):

E68891E698AFE7BE8EE59BBDE4BABAE38082E68891E58FABE9A9ACE5858BE380820D0A

Output to console cut-and-pasted here. The visual display was � for each Chinese character without the right font, but the characters display correctly pasted into SO or Notepad.

☻我是美国人。我叫马克。
like image 65
Mark Tolonen Avatar answered Sep 28 '22 09:09

Mark Tolonen