Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing reference to STL vector over dll boundary

I have a nice library for managing files that needs to return specific lists of strings. Since the only code I'm ever going to use it with is going to be C++ (and Java but that's using C++ through JNI) I decided to use vector from the standard libraries. The library functions look a little bit like this (where FILE_MANAGER_EXPORT is platform-defined export requirement):

extern "C" FILE_MANAGER_EXPORT void get_all_files(vector<string> &files)
{
    files.clear();
    for (vector<file_struct>::iterator i = file_structs.begin(); i != file_structs.end(); ++i)
    {
        files.push_back(i->full_path);
    }
}

The reason I used the vector as a reference instead of return value is an attempt to keep memory allocations sane and because windows was really unhappy me having extern "C" around a c++ return type (who knows why, my understanding is that all extern "C" does is prevent name mangling in the compiler). Anyway, the code for using this with other c++ is generally as follows:

#if defined _WIN32
    #include <Windows.h>
    #define GET_METHOD GetProcAddress
    #define OPEN_LIBRARY(X) LoadLibrary((LPCSTR)X)
    #define LIBRARY_POINTER_TYPE HMODULE
    #define CLOSE_LIBRARY FreeLibrary
#else
    #include <dlfcn.h>
    #define GET_METHOD dlsym
    #define OPEN_LIBRARY(X) dlopen(X, RTLD_NOW)
    #define LIBRARY_POINTER_TYPE void*
    #define CLOSE_LIBRARY dlclose
#endif

typedef void (*GetAllFilesType)(vector<string> &files);

int main(int argc, char **argv)
{
    LIBRARY_POINTER_TYPE manager = LOAD_LIBRARY("library.dll"); //Just an example, actual name is platform-defined too
    GetAllFilesType get_all_files_pointer = (GetAllFilesType) GET_METHOD(manager, "get_all_files");
    vector<string> files;
    (*get_all_files_pointer)(files);

    // ... Do something with files ...

    return 0;
}

The library is compiled through cmake using add_library(file_manager SHARED file_manager.cpp). The program is compiled in a separate cmake project using add_executable(file_manager_command_wrapper command_wrapper.cpp). There are no compile flags specified for either, just those commands.

Now the program works perfectly fine in both mac and linux. The problem is windows. When run, I get this error:

Debug Assertion Failed!

...

Expression: _pFirstBlock == _pHead

This, I have found out and kind of understand, is because of separate memory heaps between executables and loaded dlls. I believe this occurs when memory is allocated in one heap and deallocated in the other. The problem is, for the life of me, I can't figure what is going wrong. The memory is allocated in the executable and passed as a reference to the dll function, values are added via the reference, and then those are processed and finally deallocated back in the executable.

I would reveal more code if I could but intellectual property at my company states I can't, so all of the above code is merely examples.

Anyone with more knowledge of the subject able to help me understand this error, and point me in the right direction to debug and fix it? I'm unfortunately not able to use a windows machine for debugging since I develop on linux, then commit any changes to a gerrit server which triggers builds and tests through jenkins. I have access to the output console upon compile and test.

I did consider using non-stl types, copying the vector in c++ to a char**, but the memory allocation was a nightmare and I was struggling to get it working nicely on linux let alone windows and it's horrible multiple heaps.

EDIT: It definitely crashes as soon as the files vector goes out of scope. My current thought is that the strings put into the vector are allocated on the dll heap and deallocated on the executable heap. If this is the case, can anyone enlighten me as to a better solution?

like image 653
SmallDeadGuy Avatar asked Jul 26 '13 15:07

SmallDeadGuy


3 Answers

Your main problem is that passing C++ types across DLL boundaries is difficult. You need the following

  1. Same compiler
  2. Same standard library
  3. Same settings for exceptions
  4. In Visual C++ you need same version of the compiler
  5. In Visual C++ you need same Debug/Release configuration
  6. In Visual C++ you need same Iterator debug level

And so on

If that is what you want, I wrote a header-only library called cppcomponents https://github.com/jbandela/cppcomponents that provides the easiest way to do it in C++. You need a compiler with strong support for C++11. Gcc 4.7.2 or 4.8 will work. Visual C++ 2013 preview also works.

I will walk you through using cppcomponents to solve your problem.

  1. git clone https://github.com/jbandela/cppcomponents.git in the directory of your choice. We will refer to the directory where you ran this command as localgit

  2. Create a file called interfaces.hpp. In this file you will define the interface that can be used across compilers.

Enter the following

#include <cppcomponents/cppcomponents.hpp>

using cppcomponents::define_interface;
using cppcomponents::use;
using cppcomponents::runtime_class;
using cppcomponents::use_runtime_class;
using cppcomponents::implement_runtime_class;
using cppcomponents::uuid;
using cppcomponents::object_interfaces;

struct IGetFiles:define_interface<uuid<0x633abf15,0x131e,0x4da8,0x933f,0xc13fbd0416cd>>{

    std::vector<std::string> GetFiles();

    CPPCOMPONENTS_CONSTRUCT(IGetFiles,GetFiles);


};

inline std::string FilesId(){return "Files!Files";}
typedef runtime_class<FilesId,object_interfaces<IGetFiles>> Files_t;
typedef use_runtime_class<Files_t> Files;

Next create an implementation. To do this create Files.cpp.

Add the following code

#include "interfaces.h"


struct ImplementFiles:implement_runtime_class<ImplementFiles,Files_t>{
  std::vector<std::string> GetFiles(){
    std::vector<std::string> ret = {"samplefile1.h", "samplefile2.cpp"};
    return ret;

  }

  ImplementFiles(){}


};

CPPCOMPONENTS_DEFINE_FACTORY();

Finally here is the file to use the above. Create UseFiles.cpp

Add the following code

#include "interfaces.h"
#include <iostream>

int main(){

  Files f;
  auto vec_files = f.GetFiles();
  for(auto& name:vec_files){
      std::cout << name << "\n";
    }

}

Now you can compile. Just to show we are compatible across compilers, we will use cl the Visual C++ compiler to compile UseFiles.cpp into UseFiles.exe. We will use Mingw Gcc to compile Files.cpp into Files.dll

cl /EHsc UseFiles.cpp /I localgit\cppcomponents

where localgit is the directory in which you ran git clone as described above

g++ -std=c++11 -shared -o Files.dll Files.cpp -I localgit\cppcomponents

There is no link step. Just make sure Files.dll and UseFiles.exe are in the same directory.

Now run the executable with UseFiles

cppcomponents will also work on Linux. The main change is when you compile the exe, you need to add -ldl to the flag, and when you compile the .so file, you need to add -fPIC to the flags.

If you have further questions, let me know.

like image 104
John Bandela Avatar answered Nov 08 '22 16:11

John Bandela


Everybody seems to be hung up on the infamous DLL-compiler-incompatibility issue here, but I think you are right about this being related to the heap allocations. I suspect what is happening is that the vector (allocated in main exe's heap space) contains strings allocated in the DLL's heap space. When the vector goes out of scope and is deallocated, it's also attempting to deallocate the strings - and all this is happening on the .exe side, which causes the crash.

I have two instinctive suggestions:

  1. Wrap each string in a std::unique_ptr. It includes a 'deleter' which handles the deallocation of its contents when the unique_ptr goes out of scope. When the unique_ptr is created on the DLL side, so is its deleter. So when the vector goes out of scope and the destructors of its contents are called, the strings will be deallocated by their DLL-bound deleters and no heap conflict occurs.

    extern "C" FILE_MANAGER_EXPORT void get_all_files(vector<unique_ptr<string>>& files)
    {
        files.clear();
        for (vector<file_struct>::iterator i = file_structs.begin(); i != file_structs.end(); ++i)
        {
            files.push_back(unique_ptr<string>(new string(i->full_path)));
        }
    }
    
  2. Keep the vector on the DLL side and just return a reference to it. You can pass the reference across the DLL boundary:

    vector<string> files;
    
    extern "C" FILE_MANAGER_EXPORT vector<string>& get_all_files()
    {
        files.clear();
        for (vector<file_struct>::iterator i = file_structs.begin(); i != file_structs.end(); ++i)
        {
            files.push_back(i->full_path);
        }
        return files;
    }
    

Semi-related: “Downcasting” unique_ptr<Base> to unique_ptr<Derived> (across DLL boundary):

like image 7
d7samurai Avatar answered Nov 08 '22 15:11

d7samurai


The memory is allocated in the executable and passed as a reference to the dll function, values are added via the reference, and then those are processed and finally deallocated back in the executable.

Adding values if there is no space left (capacity) means a reallocation, so the old will be deallocated & a new will be allocated. That'll be done by the library's std::vector::push_back function, which will use the library's memory allocator.

Other than that, you've got the obvious compile-settings-must-match-exactly and of course they are kind of compiler-specifics dependant. You've most likely got to keep them synced in terms of compiles.

like image 6
dascandy Avatar answered Nov 08 '22 16:11

dascandy