Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode support in C++0x

I'm trying to use new unicode characters in C++0x. So I wrote sample code:

#include <fstream>
#include <string>
int main()
{
    std::u32string str = U"Hello World";

    std::basic_ofstream<char32_t> fout("output.txt");

    fout<<str;  
    return 0;
}

But after executing this program I'm getting empty output.txt file. So why it's not printing Hello World?

Also is there something like a cout and cin already defined for these types, or stdin and stdout doesn't support Unicode?

Edit: I'm using g++ and Linux.

EDIT:АТТЕNTION. I have discovered, that standard committee dismissed Unicode streams from C++0x. So previously accepted answer is not correct anymore. For more information see my answer!

like image 843
UmmaGumma Avatar asked Jan 16 '11 09:01

UmmaGumma


People also ask

Does C support Unicode?

It can represent all 1,114,112 Unicode characters. Most C code that deals with strings on a byte-by-byte basis still works, since UTF-8 is fully compatible with 7-bit ASCII. Characters usually require fewer than four bytes. String sort order is preserved.

Does C use ASCII or Unicode?

As far as I know, the standard C's char data type is ASCII, 1 byte (8 bits).

Is Unicode 16-bit or 32 bit?

Unicode uses two encoding forms: 8-bit and 16-bit, based on the data type of the data that is being that is being encoded. The default encoding form is 16-bit, where each character is 16 bits (2 bytes) wide. Sixteen-bit encoding form is usually shown as U+hhhh, where hhhh is the hexadecimal code point of the character.

Which programming language uses Unicode?

C#, Java, Python3, as far as I know, are all Unicode based programming.


1 Answers

Unicode string literals support began in GCC 4.5. Maybe that's the problem.

[edit]

After some digging I've found that streams for this new unicode literals are described in N2035 and it was included in a draft of the standard. According to this document you need u32ofstream to output you string but this class is absent in GCC 4.5 C++0x library.

As a workaround you can use ordinary fstream:

std::ofstream fout2("output2.txt", std::ios::out | std::ios::binary);
fout2.write((const char *)str.c_str(), str.size() * 4);

This way I've output your string in UTF-32LE on my Intel machine (which is little-endian).

[edit]

I was a little bit wrong about the status of u32ofstream: according to the latest draft on the The C++ Standards Committee's web site you have to use std::basic_ofstream<char32_t> as you did. This class would use codecvt<char32_t,char,typename traits::state_type> class (see end of §27.9.1.1) which has to be implemented in the standard library (search codecvt<char32_t in the document), but it's not available in GCC 4.5.

like image 122
ssmir Avatar answered Sep 30 '22 21:09

ssmir