Visual Studio Character Sets 'Not set' vs 'Multi byte character set'

Tags:

I've working with a legacy application and I'm trying to work out the difference between applications compiled with Multi byte character set and Not Set under the Character Set option.

I understand that compiling with Multi byte character set defines _MBCS which allows multi byte character set code pages to be used, and using Not set doesn't define _MBCS, in which case only single byte character set code pages are allowed.

In the case that Not Set is used, I'm assuming then that we can only use the single byte character set code pages found on this page: http://msdn.microsoft.com/en-gb/goglobal/bb964654.aspx

Therefore, am I correct in thinking that is Not Set is used, the application won't be able to encode and write or read far eastern languages since they are defined in double byte character set code pages (and of course Unicode)?

Following on from this, if Multi byte character set is defined, are both single and multi byte character set code pages available, or only multi byte character set code pages? I'm guessing it must be both for European languages to be supported.

Thanks,

Andy

Further Reading

The answers on these pages didn't answer my question, but helped in my understanding: About the "Character set" option in visual studio 2010

Research

So, just as working research... With my locale set as Japanese

Effect on hard coded strings

char *foo = "Jap text: テスト";
wchar_t *bar = L"Jap text: テスト";

Compiling with Unicode

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2

Compiling with Multi byte character set

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2

Compiling with Not Set

*foo = 4a 61 70 20 74 65 78 74 3a 20 83 65 83 58 83 67 == Shift-Jis (Code page 932)
*bar = 4a 00 61 00 70 00 20 00 74 00 65 00 78 00 74 00 3a 00 20 00 c6 30 b9 30 c8 30 == UTF-16 or UCS-2

Conclusion: The character encoding doesn't have any effect on hard coded strings. Although defining chars as above seems to use the Locale defined codepage and wchar_t seems to use either UCS-2 or UTF-16.

Using encoded strings in W/A versions of Win32 APIs

So, using the following code:

char *foo = "C:\\Temp\\テスト\\テa.txt";
wchar_t *bar = L"C:\\Temp\\テスト\\テw.txt";

CreateFileA(bar, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
CreateFileW(foo, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

Compiling with Unicode

Result: Both files are created

Compiling with Multi byte character set

Result: Both files are created

Compiling with Not set

Result: Both files are created

Conclusion: Both the A and W version of the API expect the same encoding regardless of the character set chosen. From this, perhaps we can assume that all the Character Set option does is switch between the version of the API. So the A version always expects strings in the encoding of the current code page and the W version always expects UTF-16 or UCS-2.

Opening files using W and A Win32 APIs

So using the following code:

char filea[MAX_PATH] = {0};
OPENFILENAMEA ofna = {0};
ofna.lStructSize = sizeof ( ofna );
ofna.hwndOwner = NULL  ;
ofna.lpstrFile = filea ;
ofna.nMaxFile = MAX_PATH;
ofna.lpstrFilter = "All\0*.*\0Text\0*.TXT\0";
ofna.nFilterIndex =1;
ofna.lpstrFileTitle = NULL ;
ofna.nMaxFileTitle = 0 ;
ofna.lpstrInitialDir=NULL ;
ofna.Flags = OFN_PATHMUSTEXIST|OFN_FILEMUSTEXIST ;  

wchar_t filew[MAX_PATH] = {0};
OPENFILENAMEW ofnw = {0};
ofnw.lStructSize = sizeof ( ofnw );
ofnw.hwndOwner = NULL  ;
ofnw.lpstrFile = filew ;
ofnw.nMaxFile = MAX_PATH;
ofnw.lpstrFilter = L"All\0*.*\0Text\0*.TXT\0";
ofnw.nFilterIndex =1;
ofnw.lpstrFileTitle = NULL;
ofnw.nMaxFileTitle = 0 ;
ofnw.lpstrInitialDir=NULL ;
ofnw.Flags = OFN_PATHMUSTEXIST|OFN_FILEMUSTEXIST ;

GetOpenFileNameA(&ofna);
GetOpenFileNameW(&ofnw);

and selecting either:

C:\Temp\テスト\テopenw.txt
C:\Temp\テスト\テopenw.txt

Yields:

When compiled with Unicode

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == UTF-16 or UCS-2

When compiled with Multi byte character set

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == UTF-16 or UCS-2

When compiled with Not Set

*filea = 43 3a 5c 54 65 6d 70 5c 83 65 83 58 83 67 5c 83 65 6f 70 65 6e 61 2e 74 78 74 == Shift-Jis (Code page 932)
*filew = 43 00 3a 00 5c 00 54 00 65 00 6d 00 70 00 5c 00 c6 30 b9 30 c8 30 5c 00 c6 30 6f 00 70 00 65 00 6e 00 77 00 2e 00 74 00 78 00 74 00 == UTF-16 or UCS-2

Conclusion: Again, the Character Set setting doesn't have a bearing on the behaviour of the Win32 API. The A version always seems to return a string with the encoding of the active code page and the W one always returns UTF-16 or UCS-2. I can actually see this explained a bit in this great answer: https://stackoverflow.com/a/3299860/187100.

Ultimate Conculsion

Hans appears to be correct when he says that the define doesn't really have any magic to it, beyond changing the Win32 APIs to use either W or A. Therefore, I can't really see any difference between Not Set and Multi byte character set.

218

asked Jul 19 '13 09:07

Andy

1 Answers

No, that's not really the way it works. The only thing that happens is that the macro gets defined, it doesn't otherwise have a magic effect on the compiler. It is very rare to actually write code that uses #ifdef _MBCS to test this macro.

You almost always leave it up to a helper function to make the conversion. Like WideCharToMultiByte(), OLE2A() or wctombs(). Which are conversion functions that always consider multi-byte encodings, as guided by the code page. _MBCS is an historical accident, relevant only 25+ years ago when multi-byte encodings were not common yet. Much like using a non-Unicode encoding is a historical artifact these days as well.

151

answered Nov 16 '22 01:11

Hans Passant

Related questions
                            
                                Which COM smart pointer classes to use?
                            
                                C++ interface multiple inheritance with same method
                            
                                Can storing unrelated data in the least-significant-bit of a pointer work reliably?
                            
                                Does C++/CX detect and solve cycles of objects?
                            
                                How to pass C# array to C++ and return it back to C# with additional items?
                            
                                Can I use $1 in regex_replace?
                            
                                C++ for Objective-C programmer [closed]
                            
                                Distinguishing between failure and end of file in read loop
                            
                                Dealing with pixels in contours (OpenCV)?
                            
                                Why are function parameters pushed earlier on call stack than the return address?
                            
                                A tool to auto separate C++ header and implementation
                            
                                Howto debug double deletes in C++?
                            
                                Unexpected results when using std::is_assignable, boost::function, and nullptr
                            
                                How to avoid overloaded assignment operator turning rvalue to lvalue?
                            
                                prevent gcc from searching the current dir "-I-" option on include search path
                            
                                Perfect Forwarding to async lambda
                            
                                Unable to step into a function in a shared library with GDB
                            
                                Could multiple proxy classes make up a STL-proof bitvector?
                            
                                Why is pointer_traits not defined for "T* const"?
                            
                                Micro-optimizing a c++ comparison function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Visual Studio Character Sets 'Not set' vs 'Multi byte character set'

Tags:

c++

visual-studio

character-encoding

winapi

Andy

People also ask

1 Answers

Hans Passant

Recent Activity

Donate For Us