Is it possible to have char *
s to work with utf8 encoding in C++ (VC2010)?
For example if my source file is saved in utf8 and I write something like this:
const char* c = "aäáéöő";
Is this possible to make it utf-8 encoded? And if yes, how is it possible to use
char* c2 = new char[strlen("aäáéöő")];
for dynamic allocation if characters can be variable length?
The encoding for narrow character string literals is implementation defined, so you'd really have to read the documentation (if you can find it). A quick experiment shows that both VC++ (VC8, anyway) and g++ (4.4.2, anyway) actually just copy the bytes from the source file; the string literal will be in whatever encoding your editor saved it in. (This is clearly in violation of the standard, but it seems to be common practice.)
C++11 has UTF-8 string literals, which would allow you to write u8"text"
, and be ensured that "text"
was encoded in UTF-8. But I don't really expect it to work reliably: the problem is that in order to do this, the compiler has to know what encoding your source file has. In all probability, compiler writers will continue to ignore the issue, just copying the bytes from the source file, and achieve conformance simply be documenting that the source file must be in UTF-8 for these features to work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With