Not being able to wrap my head around this one is a real source of shame...
I'm working with a French version of Visual Studio (2008), in a French Windows (XP). French accents put in strings sent to the output window get corrupted. Ditto input from the output window. Typical character encoding issue, I enter ANSI, get UTF-8 in return, or something to that effect. What setting can ensure that the characters remain in ANSI when showing a "hardcoded" string to the output window?
EDIT:
Example:
#include <iostream>
int main()
{
std:: cout << "àéêù" << std:: endl;
return 0;
}
Will show in the output:
óúÛ¨
(here encoded as HTML for your viewing pleasure)
I would really like it to show:
àéêù
Most C string library routines still work with UTF-8, since they only scan for terminating NUL characters.
In the Open With dialog box, choose the editor to open the file with. Many Visual Studio editors, such as the forms editor, will auto-detect the encoding and open the file appropriately. If you choose an editor that allows you to choose an encoding, the Encoding dialog box is displayed.
Yes, you very well can learn C using Visual Studio. Visual Studio comes with its own C compiler, which is actually the C++ compiler. Just use the . c file extension to save your source code.
Before I go any further, I should mention that what you are doing is not c/c++ compliant. The specification states in 2.2 what character sets are valid in source code. It ain't much in there, and all the characters used are in ascii. So... Everything below is about a specific implementation (as it happens, VC2008 on a US locale machine).
To start with, you have 4 chars on your cout
line, and 4 glyphs on the output. So the issue is not one of UTF8 encoding, as it would combine multiple source chars to less glyphs.
From you source string to the display on the console, all those things play a part:
<<
interprets the encoded string you're passing inNow...
1 and 2 are fairly easy ones. It looks like the compiler guesses what format the source file is in, and decodes it to its internal representation. It generates the string literal corresponding data chunk in the current codepage no matter what the source encoding was. I have failed to find explicit details/control on this.
3 is even easier. Except for control codes, <<
just passes the data down for char *.
4 is controlled by SetConsoleOutputCP
. It should default to your default system codepage. You can also figure out which one you have with GetConsoleOutputCP
(the input is controlled differently, through SetConsoleCP
)
5 is a funny one. I banged my head to figure out why I could not get the é to show up properly, using CP1252 (western european, windows). It turns out that my system font does not have the glyph for that character, and helpfully uses the glyph of my standard codepage (capital Theta, the same I would get if I did not call SetConsoleOutputCP). To fix it, I had to change the font I use on consoles to Lucida Console (a true type font).
Some interesting things I learned looking at this:
233 0
)So... what does this mean to you ? Here are bits of advice:
char * a = "é"; std::cout << (unsigned int) (unsigned char) a[0]
does show 233 for me, which happens to be the encoding in CP1252.BTW, if what you got was "ÓÚÛ¨" rather than what you pasted, then it looks like your 4 bytes are interpreted somewhere as CP850.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With