Since there was a lot of missinformation spread by several posters in the comments for this question: C++ ABI issues list
I have created this one to clarify.
Implementation defined. Or even application defined; the standard doesn't really put any restrictions on what an application does with them, and expects a lot of the behavior to depend on the locale. All that is really implemenation defined is the encoding used in string literals.
In what sense. Most of the OS ignores most of the encodings; you'll
have problems if '\0' isn't a nul byte, but even EBCDIC meets that
requirement. Otherwise, depending on the context, there will be a few
additional characters which may be significant (a '/' in path names,
for example); all of these use the first 128 encodings in Unicode, so
will have a single byte encoding in UTF-8. As an example, I've used
both UTF-8 and ISO 8859-1 for filenames under Linux. The only real
issue is displaying them: if you do ls in an xterm, for example,
ls and the xterm will assume that the filenames are in the same
encoding as the display font.
That mainly depends on the locale. Depending on the locale, it's quite possible for the internal encoding of a narrow character string not to correspond to that used for string literals. (But how could it be otherwise, since the encoding of a string literal must be determined at compile time, where as the internal encoding for narrow character strings depends on the locale used to read it, and can vary from one string to the next.)
If you're developing a new application in Linux, I would strongly recommend using Unicode for everything, with UTF-32 for wide character strings, and UTF-8 for narrow character strings. But don't count on anything outside the first 128 encoding points working in string literals.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With