How can I find out what the current charset is in C++?
In a console application (WinXP) I am getting negative values for some characters (like äöüé) with
and this surprises me. I was expecting the values to be between 127 and 256.
So is there something like GetCharset() or SetCharset() in c++?
It depends on how you look at the value you have at hand.
char can be signed(e.g. on Windows), or unsigned like on some other systems. So, what you should do is to print the value as unsigned to get what you are asking for.
C++ until now is char-set agnostic. For Windows console specifically, you can use:
CHAR_MAX if you don't like typing, or if you need an integer constant expression.
CHAR_MAX == UCHAR_MAX and
CHAR_MIN == 0 then chars are unsigned (as you expected). If
CHAR_MAX != UCHAR_MAX and
CHAR_MIN < 0 they are signed (as you're seeing).
In the standard 3.9.1/1, ensures that there are no other possibilities: "... a plain char can take on either the same values as a signed char or an unsigned char; which one is implementation-defined."
This tells you whether
char is signed or unsigned, and that's what's confusing you. You certainly can't call anything to modify it: from the POV of a program it's baked into the compiler even if the compiler has ways of changing it (GCC certainly does:
The usual way to deal with this is if you're going to cast a
int, cast it through
unsigned char first. So in your example,
(int)(unsigned char)mystring[a]. This ensures you get a non-negative value.
It doesn't actually tell you what charset your implementation uses for
char, but I don't think you need to know that. On Microsoft compilers, the answer is essentially that commonly-used character encoding "ISO-8859-mutter-mutter". This means that chars with 7-bit ASCII values are represented by that value, while values outside that range are ambiguous, and will be interpreted by a console or other recipient according to how that recipient is configured. ISO Latin 1 unless told otherwise.
Properly speaking, the way characters are interpreted is locale-specific, and the locale can be modified and interrogated using a whole bunch of stuff towards the end of the C++ standard that personally I've never gone through and can't advise on ;-)
Note that if there's a mismatch between the charset in effect, and the charset your console uses, then you could be in for trouble. But I think that's separate from your issue: whether chars can be negative or not is nothing to do with charsets, just whether char is signed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!Donate Us With