Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use utf8 character arrays in c++?

Tags:

c++

utf-8

Is it possible to have char *s to work with utf8 encoding in C++ (VC2010)?

For example if my source file is saved in utf8 and I write something like this:

const char* c = "aäáéöő";

Is this possible to make it utf-8 encoded? And if yes, how is it possible to use

char* c2 = new char[strlen("aäáéöő")];

for dynamic allocation if characters can be variable length?

like image 610
sekmet64 Avatar asked May 20 '11 13:05

sekmet64


1 Answers

The encoding for narrow character string literals is implementation defined, so you'd really have to read the documentation (if you can find it). A quick experiment shows that both VC++ (VC8, anyway) and g++ (4.4.2, anyway) actually just copy the bytes from the source file; the string literal will be in whatever encoding your editor saved it in. (This is clearly in violation of the standard, but it seems to be common practice.)

C++11 has UTF-8 string literals, which would allow you to write u8"text", and be ensured that "text" was encoded in UTF-8. But I don't really expect it to work reliably: the problem is that in order to do this, the compiler has to know what encoding your source file has. In all probability, compiler writers will continue to ignore the issue, just copying the bytes from the source file, and achieve conformance simply be documenting that the source file must be in UTF-8 for these features to work.

like image 161
James Kanze Avatar answered Sep 22 '22 13:09

James Kanze