Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dlopen malloc deadlock

We have some unit tests that often deadlock. Closer inspection with GDB reveals the following:

Thread 1:

(gdb) bt
#0  0x00110424 in __kernel_vsyscall ()
#1  0x00c681a3 in __lll_lock_wait_private () from /lib/libc.so.6
#2  0x00bf09fb in _L_lock_515 () from /lib/libc.so.6
#3  0x00bf068c in tr_mallochook () from /lib/libc.so.6
#4  0x00bece22 in calloc () from /lib/libc.so.6
#5  0x00b5ed93 in _dl_new_object () from /lib/ld-linux.so.2
#6  0x00b5b287 in _dl_map_object_from_fd () from /lib/ld-linux.so.2
#7  0x00b5c521 in _dl_map_object () from /lib/ld-linux.so.2
#8  0x00b66f43 in dl_open_worker () from /lib/ld-linux.so.2
#9  0x00b629a6 in _dl_catch_error () from /lib/ld-linux.so.2
#10 0x00b66a06 in _dl_open () from /lib/ld-linux.so.2
#11 0x00d38c3b in dlopen_doit () from /lib/libdl.so.2
#12 0x00b629a6 in _dl_catch_error () from /lib/ld-linux.so.2
#13 0x00d3903c in _dlerror_run () from /lib/libdl.so.2
#14 0x00d38b71 in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
...

Thread 2:

#0  0x00110424 in __kernel_vsyscall ()
#1  0x00d4c059 in __lll_lock_wait () from /lib/libpthread.so.0
#2  0x00d4740e in _L_lock_752 () from /lib/libpthread.so.0
#3  0x00d4731a in pthread_mutex_lock () from /lib/libpthread.so.0
#4  0x00c95dd2 in _dl_addr () from /lib/libc.so.6
#5  0x00bf0425 in tr_where () from /lib/libc.so.6
#6  0x00bf06bd in tr_mallochook () from /lib/libc.so.6
#7  0x00bed01b in malloc () from /lib/libc.so.6
....

I did a lot of searches on the Internet but I can't really find out whether I am doing something wrong, or whether I have found a bug in the libraries.

like image 250
Paul Praet Avatar asked Aug 14 '12 14:08

Paul Praet


1 Answers

glibc's dlopen() code doesn't seem to be thread safe.

It looks like your code calls malloc() and dlopen() concurrently from two threads. It also looks like malloc() call hits an unresolved dynamic symbol and tries to resolve it using _dl_addr(), which implies that the binary you are executing was linked with lazy binding (default ld behaviour) and this is why the runtime linker resolves symbols on demand on the first call. Try linking it with -Wl,-z,now gcc linker option to cause the runtime linker resolve all symbols prior to starting your application.

This bug looks similar to the one I filed a bug report against.

like image 180
Maxim Egorushkin Avatar answered Oct 12 '22 23:10

Maxim Egorushkin