Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ program crashes when linked to two 3rd party shared libraries

I have two outsourced shared libraries for linux platform (no source, no document). The libraries work fine when they are linked to program separately (g++ xx.cpp lib1.so, or g++ xx.cpp lib2.so).

However, when any c++ program is linked to these two shared libraries at the same time, the program inevitably crashes with "double free" error (g++ xx.cpp lib1.so lib2.so).

Even if the c++ program is an empty hello world program and has nothing to do with these libraries, it still crashes.

#include <iostream>
using namespace std;
int main(){
     cout<<"haha, I crash again. Catch me if you can"<<endl;
     return 0;
}

Makefile:

g++ helloword.cpp lib1.so lib2.so

I got some clue that these lib1.so lib2.so libraries might share some common global variable and they destroy some variable twice. I have tried gdb and valgrind, but cannot extract useful information from backtrace.

Is there any way that I could possibly isolate these two shared libraries and make them work in a sandbox manner?

EDITED (adding core dump and gdb backtrace):

I just linked the aforementioned toy empty helloword program with the two libraries (platform: centos 7.0 64bits with gcc4.8.2):

g++ helloworld.cpp  lib1.so lib2.so -o check

Valgrind:

==29953== Invalid free() / delete / delete[] / realloc()
==29953==    at 0x4C29991: operator delete(void*) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==29953==    by 0x613E589: __cxa_finalize (in /usr/lib64/libc-2.17.so)
==29953==    by 0x549B725: ??? (in /home/fanbin/InventoryManagment/lib1.so)
==29953==    by 0x5551720: ??? (in /home/fanbin/InventoryManagment/lib1.so)
==29953==    by 0x613E218: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==29953==    by 0x613E264: exit (in /usr/lib64/libc-2.17.so)
==29953==    by 0x6126AFB: (below main) (in /usr/lib64/libc-2.17.so)
==29953==  Address 0x6afb780 is 0 bytes inside a block of size 624 free'd
==29953==    at 0x4C29991: operator delete(void*) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==29953==    by 0x613E589: __cxa_finalize (in /usr/lib64/libc-2.17.so)
==29953==    by 0x4F07AC5: ??? (in /home/fanbin/InventoryManagment/lib2.so)
==29953==    by 0x5039900: ??? (in /home/fanbin/InventoryManagment/lib2.so)
==29953==    by 0x613E218: __run_exit_handlers (in /usr/lib64/libc-2.17.so)
==29953==    by 0x613E264: exit (in /usr/lib64/libc-2.17.so)
==29953==    by 0x6126AFB: (below main) (in /usr/lib64/libc-2.17.so)

gdb backtrace message:

(gdb) bt
#0  0x00007ffff677d989 in raise () from /lib64/libc.so.6
#1  0x00007ffff677f098 in abort () from /lib64/libc.so.6
#2  0x00007ffff67be197 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff67c556d in _int_free () from /lib64/libc.so.6
#4  0x00007ffff7414aa2 in __tcf_0 () from ./lib1.so
#5  0x00007ffff678158a in __cxa_finalize () from /lib64/libc.so.6
#6  0x00007ffff739f726 in __do_global_dtors_aux () from ./lib1.so
#7  0x0000000000600dc8 in __init_array_start ()
#8  0x00007fffffffe2c0 in ?? ()
#9  0x00007ffff7455721 in _fini () from ./lib1.so
#10 0x00007fffffffe2c0 in ?? ()
#11 0x00007ffff7debb98 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

update

Thanks for @RaduChivu 's help, I found a very similar scenario: segmentation fault at __tcf_0 when program exits , looks like indeed there is a global variable collision between the two libraries. Considering I do not have the source files for these two external shared libraries, except for using two separate processes, is there any other way that I can resolve this conflict?

like image 605
fanbin Avatar asked Jul 31 '14 05:07

fanbin


2 Answers

I have solved this problem after a day's search and leave a note here in case anyone else encountering this in the future.

Explanation

It proves that @RaduChivn and my guess is correct: the two shared libraries may share a common global variable. Even when an empty program is linked to both the two shared libraries at the same time, as it exits, the common global variable would be attempted to be released twice, and thus, a double free corruption.

The clue comes from this message in gdb backtrace:

#4  0x00007ffff7414aa2 in __tcf_0 () from ./lib1.so

As described in this thread:

What is function __tcf_0? (Seen when using gprof and g++),

tcf_0 is a function generated by g++ to destruct static object when exit() is triggered. This message hints that the double free occurs when one shared library attempts to quit after another one.

Since these two libraries are designed to work together, the corruption is an unacceptable engineer disaster. How can such a low-quality-yet-obvious bug survive for five version releases? It is probably due to the majority of library users working on windows platform (whose package works fine). Yet this assumption provides another hint on the mistake's origin: the shared library works well on windows while crashes on linux; then it must be some OS-dependent behavior difference causing the bug. This thread provides some insight:

Global variable has multiple copies on Windows and a single on Linux when compiled in both exec and shared libaray.

In short, "extern globals" from shared libraries get single copy on linux, but multiple copies on windows.

Solution

(1) Naturally we would have a workaround as creating two processes, each linking to one library separately.

(2) @DavidSchwartz provides another workaround of using _exit(0) at the end of program, instead of the common "return 0" or "exit(0)", it works. According to

What is the difference between using _exit() & exit() in a conventional Linux fork-exec?

, one must manually flush files and check the atexit jobs; for the memory things, since program is exiting, OS reclaims all process memory anyway, nothing to worry about.

(3) Another way is to use dlopen(xx.so, RTLD_LOCAL), blinding all symbols first and then manually dlysm the function symbols you need

(@JonathanWakely notes here RTLD_LOCAL has side effects, see comment).

In this very case, the library coder even did not use "extern C" in their shared libraries, rendering the name mangling quite unreadable in the so files; If anyone else enjoys this, the following thread may help:

Getting undefined symbol error while dynamic loading of shared library

If your shared libraries are not well supported, just as in my case, solutions are still possible. I manually sorted out all the required functions, and used nm to find each corresponding symbol in the .so files, linked them one by one, and it worked.

like image 135
fanbin Avatar answered Oct 22 '22 06:10

fanbin


One possible solution would be to never call exit. To terminate your program, just call _exit. If there's anything specific you need to do that would normally be done by exit, just do it yourself before calling _exit.

like image 38
David Schwartz Avatar answered Oct 22 '22 07:10

David Schwartz