Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DllImport - ANSI vs. Unicode

Tags:

c#

pinvoke

I have some questions about the possible answers for the test question bellow:

Question: You write the following code segment to call a function from the Win32 Application Programming Interface (API) by using platform invoke.

string personName = "N?el";
string msg = "Welcome" + personName + "to club"!";
bool rc = User32API.MessageBox(0, msg, personName, 0);

You need to define a method prototype that can best marshal the string data. Which code segment should you use?

// A.
[DllImport("user32", CharSet = CharSet.Ansi)]
public static extern bool MessageBox(int hWnd, string text, string caption, uint type);
}

// B.
[DllImport("user32", EntryPoint = "MessageBoxA", CharSet = CharSet.Ansi)]
public static extern bool MessageBox(int hWnd,
[MarshalAs(UnmanagedType.LPWStr)]string text,
[MarshalAs(UnmanagedType.LPWStr)]string caption, uint type);
}

// C. - Correct answer
[DllImport("user32", CharSet = CharSet.Unicode)]
public static extern bool MessageBox(int hWnd, string text, string caption, uint type);
}

// D.
[DllImport("user32", EntryPoint = "MessageBoxA", CharSet = CharSet.Unicode)]
public static extern bool MessageBox(int hWnd,
[MarshalAs(UnmanagedType.LPWStr)]string text,
[MarshalAs(UnmanagedType.LPWStr)]string caption,
uint type);
}

Why exactly is the correct answer C? Couldn't it just as well have been A? The only difference is that it would be ANSI instead of Unicode.

I understand that it couldn't be D because we choose Unicode as a character set and then have an ANSI function as an entrypoint.

Why wouldn't B work?

like image 838
Kasper Hansen Avatar asked Jul 23 '13 10:07

Kasper Hansen


1 Answers

 string personName = "N?el";

This string was garbled by the exact problem this question is asking about. No doubt it looked like this in the original:

 string personName = "Nöel";

The ö tends to be a problem, it has a character code that is not in the ASCII character set and might not be supported by the default system code page. Which is what is used when you P/Invoke the ANSI version of MessageBox, aka MessageBoxA. The real function is MessageBoxW, the one that takes a UTF-16 encoded Unicode string.

MessageBoxA is a legacy function that was used in old versions of Windows, back in the olden days when programs still used 8-bit character strings. It isn't completely gone, lots of C and C++ programs still tend to be stuck with 8-bit encodings. MessageBoxA is implemented by converting the 8-bit encoded strings to Unicode and then calling MessageBoxW. Which is slow and lossy if you had a Unicode string in the first place.

So rating the 4 versions:

A: uses MessageBoxA + 8-bit encoding, risky.
B: uses MessageBoxA + Unicode, fail.
C: uses MessageBoxW + Unicode, good.
D: uses MessageBoxA + Unicode, fail.

like image 56
Hans Passant Avatar answered Sep 20 '22 09:09

Hans Passant