I'm really confused about passing strings from VBA to C++. Here's the VBA code:
Private Declare Sub passBSTRVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passBSTRRef Lib "foo.dll" (ByRef s As String)
Private Declare Sub passByNarrowVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passByNarrowRef Lib "foo.dll" (ByRef s As String)
Private Declare Sub passByWideVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passByWideRef Lib "foo.dll" (ByRef s As String)
Sub foobar()
Dim s As String, str As String
str = "Hello There, World!"
s = str
Call passByBSTRVal(s)
s = str
Call passByBSTRRef(s)
s = str
Call passByNarrowVal(s)
s = str
Call passByNarrowRef(s)
s = str
Call passByWideVal(s)
s = str
Call passByWideRef(s)
End Sub
And the C++ DLL code:
void __stdcall passByBSTRVal( BSTR s )
{
MessageBox(NULL, s, L"Pass BSTR by value", MB_OK | MB_ICONINFORMATION);
}
void __stdcall passByBSTRRef( BSTR *s )
{
MessageBox(NULL, *s, L"Pass BSTR by ref", MB_OK | MB_ICONINFORMATION);
}
void __stdcall passByNarrowVal( LPCSTR s )
{
USES_CONVERSION;
MessageBox(NULL, A2W(s), L"Pass by Narrow Val", MB_OK | MB_ICONINFORMATION);
}
void __stdcall passByNarrowRef( LPCSTR* s )
{
USES_CONVERSION;
MessageBox(NULL, A2W(*s), L"Pass by Narrow Ref", MB_OK | MB_ICONINFORMATION);
}
void __stdcall passByWideVal( LPCWSTR s )
{
MessageBox(NULL, s, L"Pass by Wide Val", MB_OK | MB_ICONINFORMATION);
}
void __stdcall passByWideRef( LPCWSTR* s )
{
MessageBox(NULL, *s, L"Pass by Wide Ref", MB_OK | MB_ICONINFORMATION);
}
My expectation was that the first two calls to passByBSTRVal and passByBSTRRef would work. Why? Because VBA strings are COM BSTR objects. However, while stepping through the C++ code, the value of s for both of these functions was garbage (a bunch of kanji). Additionally, the displayed message box was (the same). I'm really surprised the first two functions didn't work.
My next expectation was for the second two calls to passByNarrowVal and passByNarrowRef to not work because a BSTR is defined as "typedef OLECHAR *BSTR", and an OLECHAR is a wide character type, while LPCSTR is a narrow character type. However, contrary to my expectation, these two functions actually did work. When I stepped through the C++ code, the parameter s was exactly what i was expecting it to be. My expectation was wrong again.
Lastly, my expectation for the final 2 functions (pass by wide val and ref) was that they would work, since an OLECHAR is a string of wide characters, so a LPCWSTR should be able to point to a BSTR. But as with case #1 (I guess these two cases are identical) my expectation was wrong. The parameter s was made up of garbage characters (and the MessageBox displayed the same garbage characters.)
Why was my intuition completely wrong? Can someone please explain what I'm not understanding here?
Here are some old reference articles, it's worth reading because it explains the root causes of all our problems:
To sum up:
String
data type, the outside world is always ANSI, ASCIIZ, CodePage, etc. ). So, even if it still uses a BSTR, that BSTR contains the ANSI equivalent of the internal Unicode storage, modulo the current locale (BSTR is like an envelope that can contain anything, including ANSI, including zero characters anywhere, provided the length matches the data).So when you use use Declare
with argument of type String
, the final binary layout will always match C's ANSI 'char *' (or LPSTR in windows macro parlance). Officially, you're still supposed to use VARIANTs if you want to pass full unicode string over interop barriers (read the links for more on this).
But, not all is lost, as VBA (not VB) has been a bit improved over the years, mainly to support Office 64-bit versions.
The LongPtr data type has been introduced. It's a type that will be a signed 32 bit integer on a 32-bit system and a signed 64 bit integer on a 64-bit system.
Note it's the exact equivalent of .NET's IntPtr (VBA also still thinks a Long is 32-bit and an Integer is 16-bit, while .NET uses Long for 64-bit and Int for 32-bit...).
Now, LongPtr
would be useless w/o the help of VB's all-time undocumented function StrPtr
that takes a string and returns a LongPtr
. It's undocumented because officially VB doesn't know what a pointer is (actually, be cautious as this can crash your program at runtime if not used properly).
So, let's suppose this C code:
STDAPI ToUpperLPWSTR(LPCWSTR in, LPWSTR out, int cch)
{
// unicode version
LCMapStringW(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, lstrlenW(in), out, cch);
return S_OK;
}
STDAPI ToUpperBSTR(BSTR in, BSTR out, int cch)
{
// unicode version
// note the usage SysStringLen here. I can do it because it's a BSTR
// and it's slightly faster than calling lstrlen...
LCMapStringW(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, SysStringLen(in), out, cch);
return S_OK;
}
STDAPI ToUpperLPSTR(LPCSTR in, LPSTR out, int cch)
{
// ansi version
LCMapStringA(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, lstrlenA(in), out, cch);
return S_OK;
}
Then you can call it with these VBA declares (note this code is 32 and 64-bit compatible):
Private Declare PtrSafe Function ToUpperLPWSTR Lib "foo.dll" (ByVal ins As LongPtr, ByVal out As LongPtr, ByVal cch As Long) As Long
Private Declare PtrSafe Function ToUpperBSTR Lib "foo.dll" (ByVal ins As LongPtr, ByVal out As LongPtr, ByVal cch As Long) As Long
Private Declare PtrSafe Function ToUpperLPSTR Lib "foo.dll" (ByVal ins As String, ByVal out As String, ByVal cch As Long) As Long
Sub Button1_Click()
Dim result As String
result = String(256, 0)
// note I use a special character 'é' to make sure it works
// I can't use any unicode character because VBA's IDE has not been updated and does not suppport the
// whole unicode range (internally it does, but you'll have to store the texts elsewhere, and load it as an opaque thing w/o the IDE involved)
ToUpperLPWSTR StrPtr("héllo world"), StrPtr(result), 256
MsgBox result
ToUpperBSTR StrPtr("héllo world"), StrPtr(result), 256
MsgBox result
ToUpperLPSTR "héllo world", result, 256
MsgBox result
End Sub
They all work, however
This form of external function call exists to be compatible with earlier versions of Visual Basic, and inherits their semantics. In particular, VB3 ran on 16-bit windows and dealt only with ANSI (i.e. MBCS) strings.
The Declare
syntax has the same restriction. VBA converts your string on the assumption that it is converting it from UTF-16 to ASCII. This allows code written in VB3 to work unchanged in VB4, VB5 and VB6.
So for example "AZ" begins as \u0041\u005A
, is converted to ANSI and becomes \x41\x5A
which is reinterpreted as \u5A41
which is "婁".
(With VB4, Microsoft merged WordBasic, Excel Basic and Visual basic into a single language, VBA.)
The "new" way to call functions from VBA, is to create a type library for the external functions you need to use, using MIDL, and add it as a reference to the project. Type libraries can describe the exact signature of the function, (e.g. BSTR
, LPCSTR
, LPCWSTR
, [out]BSTR*
, etc.) In particular it is not necessary to wrap the functions in a COM object to call them from VBA (though it is if you wish to call them from VBScript).
module
: https://msdn.microsoft.com/en-us/library/windows/desktop/aa367099(v=vs.85).aspx
Alternatively you can't be bothered to fire up midl
for a single function, you can use the VarPtr
/StrPtr
/CopyMemory
hack. This is pretty much equivalent to PEEK
and POKE
.
BIG HUGE NOTE: I'm not a programmer, I just really enjoy programming, so please be kind to me. I want to improve, so suggestions and comments from people more skilled than me (basically, everyone) are VERY welcomed!
Ben, if you're reading this, I think you opened my eyes to what's happening. MIDL sounds like the proper way of doing this, and I intend on learning it, but this seemed like a good learning opportunity, and I never let those pass me by!
I think what's happening is that narrow characters are getting marshalled into a wide character storage. For example, the string "hello" stored with narrow characters looks like:
|h |e |l |l |o |\0 |
and stored with wide characters, looks like:
|h |e |l |l |o |\0 |
But when you pass a string from VBA to C++, something really strange happens. You get narrow characters marshalled into a wide character, like this:
|h e |l l |o \0 | | | |
This is why using LPCSTR / LPCSTR* works. Yes, BSTR uses a string of wchar_t, but this marshalling makes it look like a string of char. Accessing with char* alternately points to the first and second characters in each half of the wchar_t (h, then e. l, then l. o, then \0). Even though the pointer arithmetic for char* and wchar_t* is different, it works because of the funny way the characters are marshalled. In fact, we're passed a pointer to the data string, but if you wanted to access the length of the BSTR, 4 bytes before the data string, you can play games with pointer arithmetic to get where you want to go. Assuming the BSTR is passed in as LPCSTR s,
char* ptrToChar; // 1 byte
wchar_t* ptrToWChar; // 2 bytes
int* ptrToInt; // 4 bytes
size_t strlen;
ptrToChar = (char *) s;
strlen = ptrToChar[-4];
ptrToWChar = (wchar_t *) s;
strlen = ptrToWChar[-2];
ptrToInt = (int *) s;
strlen = ptrToInt[-1];
Of course, if the string got passed in as LPCSTR* s, then of course you need to dereference s first by accessing via something like:
ptrToChar = (char *)(*s);
and so on.
If one wants to use LPCWSTR or BSTR to receive the VBA string, you have to dance around this marshalling. So for example, to create a C++ DLL that converts a VBA string to uppercase, I did the following:
BSTR __stdcall pUpper( LPCWSTR* s )
{
// Get String Length (see previous discussion)
int strlen = (*s)[-2];
// Allocate space for the new string (+1 for the NUL character).
char *dest = new char[strlen + 1];
// Accessing the *LPCWSTR s using a (char *) changes what we mean by ptr arithmetic,
// e.g. p[1] hops forward 1 byte. s[1] hops forward 2 bytes.
char *p = (char *)(*s);
// Copy the string data
for( int i = 0; i < strlen; ++i )
dest[i] = toupper(p[i]);
// And we're done!
dest[strlen] = '\0';
// Create a new BSTR using our mallocated string.
BSTR bstr = SysAllocStringByteLen(dest, strlen);
// dest needs to be garbage collected by us. COM will take care of bstr.
delete dest;
return bstr;
}
As far as I can tell, receiving the BSTR as a BSTR is equivalent to receiving it as a LPCWSTR, and receiving it as a BSTR* is equivalent to receiving it as a LPCWSTR*.
OK, I am 100% certain there are a ton of mistakes here, but I believe the underlying ideas are correct. If there are mistakes or even better ways of thinking of something, I will gladly accept corrections / explanations, and fix them for Google, posterity, and future programmers.
It sounds like the BEST way to do this is with Ben's MIDL suggestion (and maybe MIDL will make Safearrays and Variants less complicated?), and after I hit enter, I'm going to start learning that method. But this method works too and was an excellent learning opportunity for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With