Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to merge weak symbols like vtables/typeinfo across RTLD_LOCAL'ly loaded libraries?

For context: I have a Java project that is partially implemented with two JNI libraries. For the sake of example, libbar.so depends on libfoo.so. If these were system libraries,

System.loadLibrary("bar");

would do the trick. But since they're custom libraries I'm shipping with my JAR, I have to do something like

System.load("/path/to/libfoo.so");
System.load("/path/to/libbar.so");

libfoo needs to go first because otherwise libbar can't find it, as it's not in the system library search path.

This has been working well for a while, but I've now run into an issue where std::any_cast is throwing std::bad_any_cast despite the types being correct. I tracked it down to the fact that both libraries have a different definition of the typeinfo for that type, and they're not being merged at runtime. This seems to be because System.load() ends up invoking dlopen() with RTLD_LOCAL rather than RTLD_GLOBAL.

I wrote this to demonstrate the behaviour without needing JNI:

foo.hpp

class foo { };

extern "C" const void* libfoo_foo_typeinfo();

foo.cpp

#include "foo.hpp"
#include <typeinfo>

extern "C" const void* libfoo_foo_typeinfo()
{
    return &typeid(foo);
}

bar.cpp

#include "foo.hpp"
#include <typeinfo>

extern "C" const void* libbar_foo_typeinfo()
{
    return &typeid(foo);
}

main.cpp

#include <iostream>
#include <typeinfo>
#include <dlfcn.h>

int main() {
    void* libfoo = dlopen("./libfoo.so", RTLD_NOW | RTLD_LOCAL);
    void* libbar = dlopen("./libbar.so", RTLD_NOW | RTLD_LOCAL);

    auto libfoo_fn = reinterpret_cast<const void* (*)()>(
        dlsym(libfoo, "libfoo_foo_typeinfo"));
    auto libbar_fn = reinterpret_cast<const void* (*)()>(
        dlsym(libbar, "libbar_foo_typeinfo"));

    auto libfoo_ti = static_cast<const std::type_info*>(libfoo_fn());
    auto libbar_ti = static_cast<const std::type_info*>(libbar_fn());

    std::cout << std::boolalpha
              << (libfoo_ti == libbar_ti) << "\n"
              << (*libfoo_ti == *libbar_ti) << "\n";
    return 0;
}

Makefile

all: libfoo.so libbar.so main

libfoo.so: foo.cpp
        $(CXX) -fpic -shared -Wl,-soname=$@ $^ -o $@

libbar.so: bar.cpp
        $(CXX) -fpic -shared -Wl,-soname=$@ $^ -L. -lfoo -o $@

main: main.cpp
        $(CXX) $^ -ldl -o $@

On my system, I get

$ make
...
$ ./main
false
true

This is because even though the typeinfo addresses are different, GCC's libstdc++ uses the mangled names for equality. On LLVM's libc++, for example, equality is based on the typeinfo address itself, so I get:

$ make CXX="clang++ -stdlib=libc++"
$ ./main
false
false

If I pass RTLD_GLOBAL instead, I see

true
true

And if I edit main.cpp to load libbar.so first, it also works, provided I tell it where it can find libfoo.so:

$ LD_LIBRARY_PATH=. ./main
true
true

But for the reasons described at the top of this post, neither of these is a practical workaround.

This is very similar to https://github.com/android-ndk/ndk/issues/533 but with non-dynamic types, so there's no way to add a "key function" to force the typeinfo to be a strong symbol. I happened to reproduce the problem on Android first, but it isn't Android-specific.

like image 273
Tavian Barnes Avatar asked Oct 28 '22 07:10

Tavian Barnes


1 Answers

No, that is not possible. RTLD_LOCAL seeks to prevent exactly that, and unfortunately must be used for System.loadLibrary since otherwise bad things will happen if you System.loadLibrary two libraries that each define different foo classes.

like image 73
Dan Albert Avatar answered Nov 15 '22 06:11

Dan Albert