I am trying to print a Russian "ф" (U+0444 CYRILLIC SMALL LETTER EF) character, which is given a code of decimal 1092. Using C++, how can I print out this character? I would have thought something along the lines of the following would work, yet...
int main (){ wchar_t f = '1060'; cout << f << endl; }
It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII. Characters usually require fewer than four bytes. String sort order is preserved.
Unicode Character “C” (U+0043)
To insert a Unicode character, type the character code, press ALT, and then press X. For example, to type a dollar symbol ($), type 0024, press ALT, and then press X. For more Unicode character codes, see Unicode character code charts by script.
As far as I know, the standard C's char data type is ASCII, 1 byte (8 bits).
To represent the character you can use Universal Character Names (UCNs). The character 'ф' has the Unicode value U+0444 and so in C++ you could write it '\u0444' or '\U00000444'. Also if the source code encoding supports this character then you can just write it literally in your source code.
// both of these assume that the character can be represented with // a single char in the execution encoding char b = '\u0444'; char a = 'ф'; // this line additionally assumes that the source character encoding supports this character
Printing such characters out depends on what you're printing to. If you're printing to a Unix terminal emulator, the terminal emulator is using an encoding that supports this character, and that encoding matches the compiler's execution encoding, then you can do the following:
#include <iostream> int main() { std::cout << "Hello, ф or \u0444!\n"; }
This program does not require that 'ф' can be represented in a single char. On OS X and most any modern Linux install this will work just fine, because the source, execution, and console encodings will all be UTF-8 (which supports all Unicode characters).
Things are harder with Windows and there are different possibilities with different tradeoffs.
Probably the best, if you don't need portable code (you'll be using wchar_t, which should really be avoided on every other platform), is to set the mode of the output file handle to take only UTF-16 data.
#include <iostream> #include <io.h> #include <fcntl.h> int main() { _setmode(_fileno(stdout), _O_U16TEXT); std::wcout << L"Hello, \u0444!\n"; }
Portable code is more difficult.
When compiling with -std=c++11
, one can simply
const char *s = u8"\u0444"; cout << s << endl;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With