Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error loading shared libraries with dlopen()

I am working on a program that loads user-created plugins using dlopen on CentOS. I am running into a problem with a plugin that has dependencies on shared libraries that also have dependencies:

libplugin.so -> libservices.so -> libconfig.so

Our program loads the dependencies into memory first, starting at the leaves of the dependency tree and moving up to the plugin, (error checking omitted in this example):

dlopen("/path_to_plugin/libconfig.so", RTLD_NOW | RTLD_GLOBAL)
dlopen("/path_to_plugin/libservices.so", RTLD_NOW | RTLD_GLOBAL)
dlopen("/path_to_plugin/libplugin.so", RTLD_NOW | RTLD_GLOBAL)

We use this approach so the end user does not have to modify their LD_LIBRARY_PATH to point to the directory with the plugins. This approach has worked successfully for several different plugins.

We recently received a new plugin for which this approach does not work. We are able to load libconfig.so successfully, but when we try to load libservices.so, we get the following error message:

Exception libconfig.so: cannot open shared object file: No such file or directory

I know that the symbol dependencies between libraries are all satisfied, because when I set LD_LIBRARY_PATH to contain the plugin path, the plugin loads and executes correctly.

When I run strace on my program, I can see that the system is performing a search for libconfig.so as described in the dlopen man page. So it appears that, for some reason, dlopen is not detecting that libconfig.so has already been loaded. What conditions could cause this behavior?

like image 972
Blake Nelson Avatar asked Jan 04 '15 19:01

Blake Nelson


1 Answers

What conditions could cause this behavior?

When you call dlopen("/path_to_plugin/libservices.so", ...), the loader does this:

  1. Open the given path, verify it's a suitable ELF file with correct architecture
  2. Read the dynamic section of that file. For each DT_NEEDED (libconfig.so here),
  3. Scan the list of already opened DSOs, looking for exact match (this fails for you),
    1. If found, increment reference count
    2. Else try to find the needed library on disk.

Since step 3 is failing for you, it's a pretty safe bet that something has corrupted the loader list of already opened DSOs, or someone called dlclose on libconfig.so.

If GDB info shared still lists libconfig.so at step 3, then it's the former. If it doesn't, then it's the latter.

You should be able to verify corruption by looking at _r_debug->r_map elements in GDB, and comparing the entries with GDB info shared output.

The first entries in that list will have the main executable, the VDSO, and directly-linked shared libraries (e.g. libc.so.6 and libdl.so.2). Then you should see the entry for libconfig.so, except its l_name will probably be mangled in some way.

If this is indeed the case, you can find who corrupts the loader list by setting a breakpoint on dlopen("/path/to/libconfig.so",...), verifying that the r_map is correct at that point, and then setting a watchpoint on the memory that gets corrupted later.

On the other hand if you have a rogue dlclose somewhere, then just setting a breakpoint on dlclose should quickly lead you to the problem.

Update:

I'm having a difficult time figuring out how to see the contents of _r_debug->r_map

There are two ways to get access to it:

  1. Install debuginfo packages for your GLIBC. On Ubuntu, apt-get install libc6-dbg should do, or
  2. Compile your program in such a way that the debug info for _r_debug is included in it. For example:

    cat t.c
    int main() { return 0; }
    
    gcc -g t.c && gdb -q ./a.out
    (gdb) start
    Temporary breakpoint 1 at 0x4004e1: file t.c, line 1.
    Starting program: /tmp/a.out
    
    Temporary breakpoint 1, main () at t.c:1
    1   int main() { return 0; }
    (gdb) p _r_debug
    $1 = 1    # The program does not reference _r_debug itself,
              # and debuginfo is not installed. This is probably what you see.
    

Let's fix that:

cat t2.c
#include <link.h>

int main() { return _r_debug.r_version; }  // reference needed for debug info

gcc -g t2.c && gdb -q ./a.out
(gdb) start
Temporary breakpoint 1 at 0x400561: file t2.c, line 3.
Starting program: /tmp/a.out

Temporary breakpoint 1, main () at t2.c:3
3   int main() { return _r_debug.r_version; }
(gdb) p _r_debug
$1 = {r_version = 1, r_map = 0x7ffff7ffe1c8, r_brk = 140737351960640, r_state = RT_CONSISTENT, r_ldbase = 140737351884800}
(gdb) p _r_debug.r_map[0]
$2 = {l_addr = 0, l_name = 0x7ffff7df6c3d "", l_ld = 0x600e18, l_next = 0x7ffff7ffe758, l_prev = 0x0}
like image 131
Employed Russian Avatar answered Oct 08 '22 19:10

Employed Russian