Windows API: ANSI and Wide-Character Strings -- Is it UTF8 or ASCII? UTF-16 or UCS-2 LE?

Tags:

I'm not quite pro with encodings, but here's what I think I know (though it may be wrong):

ASCII is a 7-bit, fixed-length encoding, with the characters you can find in ASCII charts.
UTF8 is an 8-bit, variable-length encoding. All characters can be written in UTF8.
UCS-2 LE/BE are fixed-length, 16-bit encodings that support most common characters.
UTF-16 is a 16-bit, variable-length encoding. All characters can be written in UTF16.

Are those above all correct?

Now, for the questions:

Do the Windows "A" functions (like SetWindowTextA) take in ASCII strings? Or "multi-byte strings" (more questions on this below)?
Do the Windows "W" functions take in UTF-16 strings or UCS-2 strings? I thought they take in UCS-2, but the names confuse me.
In WideCharToMultiByte, Microsoft uses the word "wide-character string" to mean UTF-16. In that context, then what is considered a "multi-byte string"? UTF-8?
Is LPWSTR a "wide-character string"? I would say it is, but then, wouldn't that mean it's UTF-16? And wouldn't that mean that it could be used to display, say, 4-byte characters? If not, then... is displaying 4-byte characters impossible? (Windows doesn't seem to have APIs for those.)
Is the functionality of WideCharToMultiByte a superset of that of wcstombs, and do they both work on the same type of string? Or does one, say, work on UTF-16 while the other works on UCS-2?
Are file paths in UTF-16 or UCS-2? I know Windows treats it as an "opaque array of characters" from Microsoft's documentation, but per the C standard for functions like fwprintf, is there any standardized encoding?
What is "ANSI" encoding? Is that even a correct term? And how does it relate to ASCII?
(I had more questions, but this is enough... I forgot some of them anyway...)

These are a lot of questions, so any links to explanations about how all these connect (aside from reading the Unicode standard, which won't help with the Windows API anyway) would also be greatly appreciated.

Thank you!

290

asked Jan 04 '11 09:01

user541686

1 Answers

Are those above all correct?

Yes, if you don't assume the existence of characters not encoded in Unicode (for most practical applications, this assumption is fine).

Do the Windows "A" functions (like SetWindowTextA) take in ASCII strings? Or "multi-byte strings" (more questions on this below)?

They take byte strings (i.e., strings whose code unit is a byte, which is always an octet on Windows) encoded in the current "ANSI"/MBCS/legacy encoding. "ANSI" is the historical terms for these encodings, but not correct. For Western Windows systems, this encoding is usually Windows-1252.

Do the Windows "W" functions take in UTF-16 strings or UCS-2 strings? I thought they take in UCS-2, but the names confuse me.

Since Windows 2000, most of them support UTF-16. The name "wide" and the rest of the Microsoft terminology (e.g., "Unicode" meaning "UTF-16" or "UCS") were chosen before the modern Unicode standard unified the terminology.

In WideCharToMultiByte, Microsoft uses the word "wide-character string" to mean UTF-16. In that context, then what is considered a "multi-byte string"? UTF-8?

Every other encoding that WideCharToMultiByte supports is a "multi-byte encoding" in this context, including Windows-1251 and UTF-8.

Is LPWSTR a "wide-character string"? I would say it is, but then, wouldn't that mean it's UTF-16? And wouldn't that mean that it could be used to display, say, 4-byte characters? If not, then... is displaying 4-byte characters impossible? (Windows doesn't seem to have APIs for those.)

LPWSTR is a pointer to wchar_t which is always a 16-bit unsigned integer on Windows. Which characters can be displayed is unrelated to the encoding as long as that encoding can encode all Unicode characters. Windows is generally able to display non-BMP characters, but not everywhere (e.g., the console cannot).

Is the functionality of WideCharToMultiByte a superset of that of wcstombs, and do they both work on the same type of string? Or does one, say, work on UTF-16 while the other works on UCS-2?

Don't really know, but I don't think they differ too much. I suppose you just try to convert some non-BMP character to UTF-8 and look whether the result is correct.

Are file paths in UTF-16 or UCS-2? I know Windows treats it as an "opaque array of characters" from Microsoft's documentation, but per the C standard for functions like fwprintf, is there any standardized encoding?

File paths are indeed opaque arrays of UTF-16 characters, meaning that Windows doesn't perform any kind of translation when storing or reading file names (like Linux and unlike Mac OS X). But Windows still has its weird mostly-undefined case insensitive behavior which causes much trouble because file names that are treated equivalent aren't necessarily equal. That breaks many invariants; for example, on Linux without interference from other threads, if you successfully create two files A and a in some directory, you'll end up with two distinct files, while on Windows you get only one file (and in general, an unpredictable number of files).

What is "ANSI" encoding? Is that even a correct term? And how does it relate to ASCII?

ANSI is the American standardization organization. Using this word when referring to encodings is a misnomer, but a frequent one, so you should be aware of it. I prefer the term legacy 8-bit encoding, because I think that's essentially what it is: a non-Unicode encoding that is kept only for compatibility with legacy (Windows 9x) applications. On Western systems, this is usually Windows-1252, which is a proper superset of ASCII.

answered Sep 19 '22 23:09

Philipp

Related questions
                            
                                Should DWORD map to int or uint?
                            
                                How to detect Windows 10 light/dark mode in Win32 application?
                            
                                How to programmatically unplug & replug an arbitrary USB device?
                            
                                How can I open a local HTML file in Microsoft Edge browser?
                            
                                What's a ".dll.a" file?
                            
                                How do I get the file HANDLE from the fopen FILE structure?
                            
                                <system_error> categories and standard/system error codes
                            
                                Win32: full-screen and hiding taskbar
                            
                                Detect active window changed using C# without polling
                            
                                Equivalent of gettimeday() for Windows
                            
                                How do I compute the non-client window size in WPF?
                            
                                What are the differences between BringWindowToTop, SetForegroundwindow, SetWindowPos etc.?
                            
                                Calling Win32 API method from Java
                            
                                What exactly is the scope of Access Violation '0xc0000005'?
                            
                                Show touch keyboard (TabTip.exe) in Windows 10 Anniversary edition
                            
                                How to copy string to clipboard in C?
                            
                                Gracefully Exit Explorer (Programmatically)
                            
                                Is Learning the win32 API Worthwhile? [closed]
                            
                                Alignment requirements for atomic x86 instructions vs. MS's InterlockedCompareExchange documentation?
                            
                                How to embed WebKit into my C/C++/Win32 application?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Windows API: ANSI and Wide-Character Strings -- Is it UTF8 or ASCII? UTF-16 or UCS-2 LE?

Tags:

unicode

ascii

winapi

widechar

multibyte-functions

user541686

People also ask

1 Answers

Philipp

Recent Activity

Donate For Us