How to use utf8 character arrays in c++?

Question

Is it possible to have char *s to work with utf8 encoding in C++ (VC2010)?

For example if my source file is saved in utf8 and I write something like this:

const char* c = "aäáéöő";

Is this possible to make it utf-8 encoded? And if yes, how is it possible to use

char* c2 = new char[strlen("aäáéöő")];

for dynamic allocation if characters can be variable length?

James Kanze · Accepted Answer

The encoding for narrow character string literals is implementation defined, so you'd really have to read the documentation (if you can find it). A quick experiment shows that both VC++ (VC8, anyway) and g++ (4.4.2, anyway) actually just copy the bytes from the source file; the string literal will be in whatever encoding your editor saved it in. (This is clearly in violation of the standard, but it seems to be common practice.)

C++11 has UTF-8 string literals, which would allow you to write u8"text", and be ensured that "text" was encoded in UTF-8. But I don't really expect it to work reliably: the problem is that in order to do this, the compiler has to know what encoding your source file has. In all probability, compiler writers will continue to ignore the issue, just copying the bytes from the source file, and achieve conformance simply be documenting that the source file must be in UTF-8 for these features to work.

How to use utf8 character arrays in c++?

Tags:

c++

utf-8

sekmet64

1 Answers

James Kanze

Recent Activity

Donate For Us

How to use utf8 character arrays in c++?

Tags:

c++

utf-8

sekmet64

1 Answers

James Kanze

Related questions

Recent Activity

Donate For Us