Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing strings from VBA to C++ DLL

Tags:

c++

excel

vba

I'm really confused about passing strings from VBA to C++. Here's the VBA code:

Private Declare Sub passBSTRVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passBSTRRef Lib "foo.dll" (ByRef s As String)
Private Declare Sub passByNarrowVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passByNarrowRef Lib "foo.dll" (ByRef s As String)
Private Declare Sub passByWideVal Lib "foo.dll" (ByVal s As String)
Private Declare Sub passByWideRef Lib "foo.dll" (ByRef s As String)

Sub foobar()
    Dim s As String, str As String
    str = "Hello There, World!"

    s = str
    Call passByBSTRVal(s)
    s = str
    Call passByBSTRRef(s)
    s = str
    Call passByNarrowVal(s)
    s = str
    Call passByNarrowRef(s)
    s = str
    Call passByWideVal(s)
    s = str
    Call passByWideRef(s)
End Sub

And the C++ DLL code:

void __stdcall passByBSTRVal( BSTR s )
{
    MessageBox(NULL, s, L"Pass BSTR by value", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByBSTRRef( BSTR *s )
{
    MessageBox(NULL, *s, L"Pass BSTR by ref", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByNarrowVal( LPCSTR s )
{
    USES_CONVERSION;
    MessageBox(NULL, A2W(s), L"Pass by Narrow Val", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByNarrowRef( LPCSTR* s )
{
    USES_CONVERSION;
    MessageBox(NULL, A2W(*s), L"Pass by Narrow Ref", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByWideVal( LPCWSTR s )
{
    MessageBox(NULL, s, L"Pass by Wide Val", MB_OK | MB_ICONINFORMATION);
}

void __stdcall passByWideRef( LPCWSTR* s )
{
    MessageBox(NULL, *s, L"Pass by Wide Ref", MB_OK | MB_ICONINFORMATION);
}

My expectation was that the first two calls to passByBSTRVal and passByBSTRRef would work. Why? Because VBA strings are COM BSTR objects. However, while stepping through the C++ code, the value of s for both of these functions was garbage (a bunch of kanji). Additionally, the displayed message box was (the same). I'm really surprised the first two functions didn't work.

My next expectation was for the second two calls to passByNarrowVal and passByNarrowRef to not work because a BSTR is defined as "typedef OLECHAR *BSTR", and an OLECHAR is a wide character type, while LPCSTR is a narrow character type. However, contrary to my expectation, these two functions actually did work. When I stepped through the C++ code, the parameter s was exactly what i was expecting it to be. My expectation was wrong again.

Lastly, my expectation for the final 2 functions (pass by wide val and ref) was that they would work, since an OLECHAR is a string of wide characters, so a LPCWSTR should be able to point to a BSTR. But as with case #1 (I guess these two cases are identical) my expectation was wrong. The parameter s was made up of garbage characters (and the MessageBox displayed the same garbage characters.)

Why was my intuition completely wrong? Can someone please explain what I'm not understanding here?

like image 766
Mathematician Avatar asked Sep 09 '16 04:09

Mathematician


3 Answers

Here are some old reference articles, it's worth reading because it explains the root causes of all our problems:

  • How to: Access DLLs in Excel
  • And two older VB ones (VBA is an updated VB runtime): Passing Strings to a DLL Procedure Declaring a DLL Procedure

To sum up:

  • VBA internal storage is BSTR with unicode characters in it.
  • VBA also uses BSTR for talking with the external world, but you don't have to use BSTR if you don't want to because from C/C++, you may choose to use only the pointer part of the BSTR (a BSTR is a LPWSTR, an LPWSTR is not a BSTR).
  • The content of the BSTR that VBA uses to communicate outside its world is not unicode but ANSI (VBA is still living in the 90s and thinks that, regarding the String data type, the outside world is always ANSI, ASCIIZ, CodePage, etc. ). So, even if it still uses a BSTR, that BSTR contains the ANSI equivalent of the internal Unicode storage, modulo the current locale (BSTR is like an envelope that can contain anything, including ANSI, including zero characters anywhere, provided the length matches the data).

So when you use use Declare with argument of type String, the final binary layout will always match C's ANSI 'char *' (or LPSTR in windows macro parlance). Officially, you're still supposed to use VARIANTs if you want to pass full unicode string over interop barriers (read the links for more on this).

But, not all is lost, as VBA (not VB) has been a bit improved over the years, mainly to support Office 64-bit versions.

The LongPtr data type has been introduced. It's a type that will be a signed 32 bit integer on a 32-bit system and a signed 64 bit integer on a 64-bit system.

Note it's the exact equivalent of .NET's IntPtr (VBA also still thinks a Long is 32-bit and an Integer is 16-bit, while .NET uses Long for 64-bit and Int for 32-bit...).

Now, LongPtr would be useless w/o the help of VB's all-time undocumented function StrPtr that takes a string and returns a LongPtr. It's undocumented because officially VB doesn't know what a pointer is (actually, be cautious as this can crash your program at runtime if not used properly).

So, let's suppose this C code:

  STDAPI ToUpperLPWSTR(LPCWSTR in, LPWSTR out, int cch)
  {
    // unicode version
    LCMapStringW(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, lstrlenW(in), out, cch);
    return S_OK;
  }

  STDAPI ToUpperBSTR(BSTR in, BSTR out, int cch)
  {
    // unicode version
    // note the usage SysStringLen here. I can do it because it's a BSTR
    // and it's slightly faster than calling lstrlen...
    LCMapStringW(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, SysStringLen(in), out, cch);
    return S_OK;
  }

  STDAPI ToUpperLPSTR(LPCSTR in, LPSTR out, int cch)
  {
    // ansi version
    LCMapStringA(LOCALE_USER_DEFAULT, LCMAP_LINGUISTIC_CASING | LCMAP_UPPERCASE, in, lstrlenA(in), out, cch);
    return S_OK;
  }

Then you can call it with these VBA declares (note this code is 32 and 64-bit compatible):

  Private Declare PtrSafe Function ToUpperLPWSTR Lib "foo.dll" (ByVal ins As LongPtr, ByVal out As LongPtr, ByVal cch As Long) As Long
  Private Declare PtrSafe Function ToUpperBSTR Lib "foo.dll" (ByVal ins As LongPtr, ByVal out As LongPtr, ByVal cch As Long) As Long
  Private Declare PtrSafe Function ToUpperLPSTR Lib "foo.dll" (ByVal ins As String, ByVal out As String, ByVal cch As Long) As Long

  Sub Button1_Click()

      Dim result As String
      result = String(256, 0)

      // note I use a special character 'é' to make sure it works
      // I can't use any unicode character because VBA's IDE has not been updated and does not suppport the
      // whole unicode range (internally it does, but you'll have to store the texts elsewhere, and load it as an opaque thing w/o the IDE involved)

      ToUpperLPWSTR StrPtr("héllo world"), StrPtr(result), 256
      MsgBox result
      ToUpperBSTR StrPtr("héllo world"), StrPtr(result), 256
      MsgBox result
      ToUpperLPSTR "héllo world", result, 256
      MsgBox result
  End Sub

They all work, however

  • the ToUpperLPSTR is an ANSI fonction, so it will not support the unicode range that most people use nowadays. It works for me because the special non ASCII 'é' character coded in the IDE will find a correspondance when I run it in my machine with my ANSI codepage. But it may not work depending on where it runs. With unicode, you don't have that kind of problems.
  • the ToUpperBSTR is kinda specialized for VBA (COM automation) clients. If this function is called from a C/C++ client, the C/C++ coder will have to create a BSTR to use it, so it may look funny and add more work. Note however it will support strings that contains the zero character in it, thanks to the way BSTR work. It can sometimes be useful for example to pass array of bytes or special strings.
like image 69
Simon Mourier Avatar answered Oct 21 '22 14:10

Simon Mourier


This form of external function call exists to be compatible with earlier versions of Visual Basic, and inherits their semantics. In particular, VB3 ran on 16-bit windows and dealt only with ANSI (i.e. MBCS) strings.

The Declare syntax has the same restriction. VBA converts your string on the assumption that it is converting it from UTF-16 to ASCII. This allows code written in VB3 to work unchanged in VB4, VB5 and VB6.

So for example "AZ" begins as \u0041\u005A, is converted to ANSI and becomes \x41\x5A which is reinterpreted as \u5A41 which is "婁".

(With VB4, Microsoft merged WordBasic, Excel Basic and Visual basic into a single language, VBA.)

The "new" way to call functions from VBA, is to create a type library for the external functions you need to use, using MIDL, and add it as a reference to the project. Type libraries can describe the exact signature of the function, (e.g. BSTR, LPCSTR, LPCWSTR, [out]BSTR*, etc.) In particular it is not necessary to wrap the functions in a COM object to call them from VBA (though it is if you wish to call them from VBScript).

  • A set of DLL functions is described as a MIDL module: https://msdn.microsoft.com/en-us/library/windows/desktop/aa367099(v=vs.85).aspx

Alternatively you can't be bothered to fire up midl for a single function, you can use the VarPtr/StrPtr/CopyMemory hack. This is pretty much equivalent to PEEK and POKE.

like image 25
Ben Avatar answered Oct 21 '22 15:10

Ben


BIG HUGE NOTE: I'm not a programmer, I just really enjoy programming, so please be kind to me. I want to improve, so suggestions and comments from people more skilled than me (basically, everyone) are VERY welcomed!

Ben, if you're reading this, I think you opened my eyes to what's happening. MIDL sounds like the proper way of doing this, and I intend on learning it, but this seemed like a good learning opportunity, and I never let those pass me by!

I think what's happening is that narrow characters are getting marshalled into a wide character storage. For example, the string "hello" stored with narrow characters looks like:

|h |e |l |l |o |\0 |

and stored with wide characters, looks like:

|h   |e   |l   |l   |o   |\0   |

But when you pass a string from VBA to C++, something really strange happens. You get narrow characters marshalled into a wide character, like this:

|h e |l l |o \0 |    |    |    |

This is why using LPCSTR / LPCSTR* works. Yes, BSTR uses a string of wchar_t, but this marshalling makes it look like a string of char. Accessing with char* alternately points to the first and second characters in each half of the wchar_t (h, then e. l, then l. o, then \0). Even though the pointer arithmetic for char* and wchar_t* is different, it works because of the funny way the characters are marshalled. In fact, we're passed a pointer to the data string, but if you wanted to access the length of the BSTR, 4 bytes before the data string, you can play games with pointer arithmetic to get where you want to go. Assuming the BSTR is passed in as LPCSTR s,

char* ptrToChar;      // 1 byte
wchar_t* ptrToWChar;  // 2 bytes
int* ptrToInt;        // 4 bytes
size_t strlen;

ptrToChar = (char *) s;
strlen = ptrToChar[-4];

ptrToWChar = (wchar_t *) s;
strlen = ptrToWChar[-2];

ptrToInt = (int *) s;
strlen = ptrToInt[-1];

Of course, if the string got passed in as LPCSTR* s, then of course you need to dereference s first by accessing via something like:

ptrToChar = (char *)(*s);

and so on.

If one wants to use LPCWSTR or BSTR to receive the VBA string, you have to dance around this marshalling. So for example, to create a C++ DLL that converts a VBA string to uppercase, I did the following:

BSTR __stdcall pUpper( LPCWSTR* s )
{
    // Get String Length (see previous discussion)
    int strlen = (*s)[-2];

    // Allocate space for the new string (+1 for the NUL character).
    char *dest = new char[strlen + 1];

    // Accessing the *LPCWSTR s using a (char *) changes what we mean by ptr arithmetic,
    // e.g. p[1] hops forward 1 byte.  s[1] hops forward 2 bytes.
    char *p = (char *)(*s);

    // Copy the string data
    for( int i = 0; i < strlen; ++i )
        dest[i] = toupper(p[i]);

    // And we're done!
    dest[strlen] = '\0';

    // Create a new BSTR using our mallocated string.
    BSTR bstr = SysAllocStringByteLen(dest, strlen);

    // dest needs to be garbage collected by us.  COM will take care of bstr.
    delete dest;
    return bstr;
}

As far as I can tell, receiving the BSTR as a BSTR is equivalent to receiving it as a LPCWSTR, and receiving it as a BSTR* is equivalent to receiving it as a LPCWSTR*.

OK, I am 100% certain there are a ton of mistakes here, but I believe the underlying ideas are correct. If there are mistakes or even better ways of thinking of something, I will gladly accept corrections / explanations, and fix them for Google, posterity, and future programmers.

It sounds like the BEST way to do this is with Ben's MIDL suggestion (and maybe MIDL will make Safearrays and Variants less complicated?), and after I hit enter, I'm going to start learning that method. But this method works too and was an excellent learning opportunity for me.

like image 37
Mathematician Avatar answered Oct 21 '22 15:10

Mathematician