I have a problem with writing unicode to a file in C++. I want to write to a file with my own extension a few smiley faces that you can get by typing ALT+NUMPAD(2). I can display it on CMD by making a char and assigning the value of '\2' to it and it will display a smiley face, but it won't write it to a file.
Here is a snippet of code for my program:
ofstream myfile;
myfile.open("C:\Users\My Username\test.exampleCodeFile");
myfile << "\2";
myfile.close();
It will write to the file, but it wont display what I want. I would show you what it displays but StackOverflow won't let me display the character. Thanks in advance.
It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII.
UTF-8 is a Unicode character encoding method. This means that UTF-8 takes the code point for a given Unicode character and translates it into a string of binary. It also does the reverse, reading in binary digits and converting them back to characters.
C++ provides a wide-character type, wchar_t , which can store Unicode strings. The exact implementation of wchar_t is implementation defined, but it is often UTF-32. The class wstring , defined in <string> , is a sequence of wchar_t s, just like the string class is a sequence of char s.
You have to use Unicode to specify the characters you want to display. The character represented by byte 02h
in the console is translated by code page 437 (cp437) to the Unicode character U+263B
. Using a source file saved in UTF-8 with BOM makes using Unicode easier, because you can paste or type the characters you want without resorting to Unicode escape codes.
For a file stream the stream needs to be configured for UTF-8. There are various ways to do this and it depends on the compiler, but using Visual Studio 2012, source saved in UTF-8 w/ BOM, and a bit of Googling:
#include <locale>
#include <codecvt>
#include <fstream>
#include <iostream>
#include <io.h>
#include <fcntl.h>
using namespace std;
int main()
{
const std::locale utf8_locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
wofstream f(L"sample.txt");
f.imbue(utf8_locale);
f << L"\u263b我是美国人。我叫马克。" << endl;
_setmode(_fileno(stdout),_O_U16TEXT);
wcout << L"\u263b我是美国人。我叫马克。" << endl;
}
Content of sample.txt
as viewed in Notepad:
☻我是美国人。我叫马克。
Hex dump (correct UTF-8):
E68891E698AFE7BE8EE59BBDE4BABAE38082E68891E58FABE9A9ACE5858BE380820D0A
Output to console cut-and-pasted here. The visual display was � for each Chinese character without the right font, but the characters display correctly pasted into SO or Notepad.
☻我是美国人。我叫马克。
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With