Here is the program that gives the segfault.
#include <iostream> #include <vector> #include <memory> int main() { std::cout << "Hello World" << std::endl; std::vector<std::shared_ptr<int>> y {}; std::cout << "Hello World" << std::endl; }
Of course, there is absolutely nothing wrong in the program itself. The root cause of the segfault depends on the environment in which its built and ran.
We, at Amazon, use a build system which builds and deploys the binaries (lib
and bin
) in an almost machine independent way. For our case, that basically means it deploys the executable (built from the above program) into $project_dir/build/bin/
and almost all its dependencies (i.e the shared libraries) into $project_dir/build/lib/
. Why I used the phrase "almost" is because for shared libraries such libc.so
, libm.so
, ld-linux-x86-64.so.2
and possibly few others, the executable picks from the system (i.e from /lib64
). Note that it is supposed to pick libstdc++
from $project_dir/build/lib
though.
Now I run it as follows:
$ LD_LIBRARY_PATH=$project_dir/build/lib ./build/bin/run segmentation fault
However if I run it, without setting the LD_LIBRARY_PATH
. It runs fine.
Here are ldd
informations for both cases (please note that I've edited the output to mention the full version of the libraries wherever there is difference)
$ LD_LIBRARY_PATH=$project_dir/build/lib ldd ./build/bin/run linux-vdso.so.1 => (0x00007ffce19ca000) libstdc++.so.6 => $project_dir/build/lib/libstdc++.so.6.0.20 libgcc_s.so.1 => $project_dir/build/lib/libgcc_s.so.1 libc.so.6 => /lib64/libc.so.6 libm.so.6 => /lib64/libm.so.6 /lib64/ld-linux-x86-64.so.2 (0x0000562ec51bc000)
and without LD_LIBRARY_PATH:
$ ldd ./build/bin/run linux-vdso.so.1 => (0x00007fffcedde000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6.0.16 libgcc_s.so.1 => /lib64/libgcc_s-4.4.6-20110824.so.1 libc.so.6 => /lib64/libc.so.6 libm.so.6 => /lib64/libm.so.6 /lib64/ld-linux-x86-64.so.2 (0x0000560caff38000)
Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7dea45c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.209.62.al12.x86_64 (gdb) bt #0 0x00007ffff7dea45c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 #1 0x00007ffff7df0c55 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2 #2 0x00007ffff7b1dc41 in std::locale::_S_initialize() () from $project_dir/build/lib/libstdc++.so.6 #3 0x00007ffff7b1dc85 in std::locale::locale() () from $project_dir/build/lib/libstdc++.so.6 #4 0x00007ffff7b1a574 in std::ios_base::Init::Init() () from $project_dir/build/lib/libstdc++.so.6 #5 0x0000000000400fde in _GLOBAL__sub_I_main () at $project_dir/build/gcc-4.9.4/include/c++/4.9.4/iostream:74 #6 0x00000000004012ed in __libc_csu_init () #7 0x00007ffff7518cb0 in __libc_start_main () from /lib64/libc.so.6 #8 0x0000000000401021 in _start () (gdb)
I also tried to see the linker information by enabling LD_DEBUG=all
for the segfault case. I found something suspicious, as it searches for pthread_once
symbol, and when it unable to find this, it gives segfault (that is my interpretation of the following output snippet BTW):
initialize program: $project_dir/build/bin/run symbol=_ZNSt8ios_base4InitC1Ev; lookup in file=$project_dir/build/bin/run [0] symbol=_ZNSt8ios_base4InitC1Ev; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0] binding file $project_dir/build/bin/run [0] to $project_dir/build/lib/libstdc++.so.6 [0]: normal symbol `_ZNSt8ios_base4InitC1Ev' [GLIBCXX_3.4] symbol=_ZNSt6localeC1Ev; lookup in file=$project_dir/build/bin/run [0] symbol=_ZNSt6localeC1Ev; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0] binding file $project_dir/build/lib/libstdc++.so.6 [0] to $project_dir/build/lib/libstdc++.so.6 [0]: normal symbol `_ZNSt6localeC1Ev' [GLIBCXX_3.4] symbol=pthread_once; lookup in file=$project_dir/build/bin/run [0] symbol=pthread_once; lookup in file=$project_dir/build/lib/libstdc++.so.6 [0] symbol=pthread_once; lookup in file=$project_dir/build/lib/libgcc_s.so.1 [0] symbol=pthread_once; lookup in file=/lib64/libc.so.6 [0] symbol=pthread_once; lookup in file=/lib64/libm.so.6 [0] symbol=pthread_once; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
But I dont see any pthread_once
for the case when it runs successfully!
I know that its very difficult to debug like this and probably I've not given a lot of informations about the environments and all. But still, my question is: what could be the possible root-cause for this segfault? How to debug further and find that? Once I find the issue, fix would be easy.
I'm using GCC 4.9 on RHEL5.
If I comment the following line:
std::vector<std::shared_ptr<int>> y {};
It compiles and runs fine!
I just included the following header to my program:
#include <boost/filesystem.hpp>
and linked accordingly. Now it works without any segfault. So it seems by having a dependency on libboost_system.so.1.53.0.
, some requirements are met, or the problem is circumvented!
Since I saw it working when I made the executable to be linked against libboost_system.so.1.53.0
, so I did the following things step by step.
Instead of using #include <boost/filesystem.hpp>
in the code itself, I use the original code and ran it by preloading libboost_system.so
using LD_PRELOAD
as follows:
$ LD_PRELOAD=$project_dir/build/lib/libboost_system.so $project_dir/build/bin/run
and it ran successfully!
Next I did ldd
on the libboost_system.so
which gave a list of libs, two of which were:
/lib64/librt.so.1 /lib64/libpthread.so.0
So instead of preloading libboost_system
, I preload librt
and libpthread
separately:
$ LD_PRELOAD=/lib64/librt.so.1 $project_dir/build/bin/run $ LD_PRELOAD=/lib64/libpthread.so.0 $project_dir/build/bin/run
In both cases, it ran successfully.
Now my conclusion is that by loading either librt
or libpthread
(or both ), some requirements are met or the problem is circumvented! I still dont know the root cause of the issue, though.
Since the build system is complex and there are lots of options which are there by default. So I tried to explicitly add -lpthread
using CMake's set
command, then it worked, as we have already seen that by preloading libpthread
it works!
In order to see the build difference between these two cases (when-it-works and when-it-gives-segfault), I built it in verbose mode by passing -v
to GCC, to see the compilation stages and the options it actually passes to cc1plus
(compiler) and collect2
(linker).
(Note that paths has been edited for brevity, using dollar-sign and dummy paths.)
$/gcc-4.9.4/cc1plus -quiet -v -I /a/include -I /b/include -iprefix $/gcc-4.9.4/ -MMD main.cpp.d -MF main.cpp.o.d -MT main.cpp.o -D_GNU_SOURCE -D_REENTRANT -D __USE_XOPEN2K8 -D _LARGEFILE_SOURCE -D _FILE_OFFSET_BITS=64 -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -D NDEBUG $/lab/main.cpp -quiet -dumpbase main.cpp -msse -mfpmath=sse -march=core2 -auxbase-strip main.cpp.o -g -O3 -Wall -Wextra -std=gnu++1y -version -fdiagnostics-color=auto -ftemplate-depth=128 -fno-operator-names -o /tmp/ccxfkRyd.s
Irrespective of whether it works or not, the command-line arguments to cc1plus
are exactly the same. No difference at all. That does not seem to be very helpful.
The difference, however, is at the linking time. Here is what I see, for the case when it works:
$/gcc-4.9.4/collect2 -plugin $/gcc-4.9.4/liblto_plugin.so
-plugin-opt=$/gcc-4.9.4/lto-wrapper -plugin-opt=-fresolution=/tmp/cchl8RtI.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lpthread -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --eh-frame-hdr -m elf_x86_64 -export-dynamic -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o run /usr/lib/../lib64/crt1.o /usr/lib/../lib64/crti.o $/gcc-4.9.4/crtbegin.o -L/a/lib -L/b/lib -L/c/lib -lpthread --as-needed main.cpp.o -lboost_timer -lboost_wave -lboost_chrono -lboost_filesystem -lboost_graph -lboost_locale -lboost_thread -lboost_wserialization -lboost_atomic -lboost_context -lboost_date_time -lboost_iostreams -lboost_math_c99 -lboost_math_c99f -lboost_math_c99l -lboost_math_tr1 -lboost_math_tr1f -lboost_math_tr1l -lboost_mpi -lboost_prg_exec_monitor -lboost_program_options -lboost_random -lboost_regex -lboost_serialization -lboost_signals -lboost_system -lboost_unit_test_framework -lboost_exception -lboost_test_exec_monitor -lbz2 -licui18n -licuuc -licudata -lz -rpath /a/lib:/b/lib:/c/lib: -lstdc++ -lm -lgcc_s -lgcc -lpthread -lc -lgcc_s -lgcc $/gcc-4.9.4/crtend.o /usr/lib/../lib64/crtn.o
As you can see, -lpthread
is mentioned twice! The first -lpthread
(which is followed by --as-needed
) is missing for the case when it gives segfault. That is the only difference between these two cases.
nm -C
in both casesInterestingly, the output of nm -C
in both cases is identical (if you ignore the integer values in the first columns).
0000000000402580 d _DYNAMIC 0000000000402798 d _GLOBAL_OFFSET_TABLE_ 0000000000401000 t _GLOBAL__sub_I_main 0000000000401358 R _IO_stdin_used w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable w _Jv_RegisterClasses U _Unwind_Resume 0000000000401150 W std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_destroy() 0000000000401170 W std::vector<std::shared_ptr<int>, std::allocator<std::shared_ptr<int> > >::~vector() 0000000000401170 W std::vector<std::shared_ptr<int>, std::allocator<std::shared_ptr<int> > >::~vector() 0000000000401250 W std::vector<std::unique_ptr<int, std::default_delete<int> >, std::allocator<std::unique_ptr<int, std::default_delete<int> > > >::~vector() 0000000000401250 W std::vector<std::unique_ptr<int, std::default_delete<int> >, std::allocator<std::unique_ptr<int, std::default_delete<int> > > >::~vector() U std::ios_base::Init::Init() U std::ios_base::Init::~Init() 0000000000402880 B std::cout U std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&) 0000000000402841 b std::__ioinit U std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) U operator delete(void*) U operator new(unsigned long) 0000000000401510 r __FRAME_END__ 0000000000402818 d __JCR_END__ 0000000000402818 d __JCR_LIST__ 0000000000402820 d __TMC_END__ 0000000000402820 d __TMC_LIST__ 0000000000402838 A __bss_start U __cxa_atexit 0000000000402808 D __data_start 0000000000401100 t __do_global_dtors_aux 0000000000402820 t __do_global_dtors_aux_fini_array_entry 0000000000402810 d __dso_handle 0000000000402828 t __frame_dummy_init_array_entry w __gmon_start__ U __gxx_personality_v0 0000000000402838 t __init_array_end 0000000000402828 t __init_array_start 00000000004012b0 T __libc_csu_fini 00000000004012c0 T __libc_csu_init U __libc_start_main w __pthread_key_create 0000000000402838 A _edata 0000000000402990 A _end 000000000040134c T _fini 0000000000400e68 T _init 0000000000401028 T _start 0000000000401054 t call_gmon_start 0000000000402840 b completed.6661 0000000000402808 W data_start 0000000000401080 t deregister_tm_clones 0000000000401120 t frame_dummy 0000000000400f40 T main 00000000004010c0 t register_tm_clones
Use debuggers to diagnose segfaults Start your debugger with the command gdb core , and then use the backtrace command to see where the program was when it crashed. This simple trick will allow you to focus on that part of the code.
No, memory leaks by themselves would not cause a segmentation fault. However, memory leaks usually indicate sloppy code, and in sloppy code other issues, which would cause a segmentation fault, are likely to be present.
GDB and Valgrind are great helpful tools to detect and correct segmentation fault and memory leaks.
This is likely a problem caused by subtle mismatches between libstdc++
ABIs. GCC 4.9 is not the system compiler on Red Hat Enterprise Linux 5, so it's not quite clear what you are using there (DTS 3?).
The locale implementation is known to be quite sensitive to ABI mismatches. See this thread on the gcc-help list:
Your best bet is to figure out which bits of libstdc++
where linked where, and somehow achieve consistency (either by hiding symbols, or recompiling things so that they are compatible).
It may also be useful to investigate the hybrid linkage model used for libstdc++
in Red Hat's Developer Toolset (where newer bits are linked statically, but the bulk of the C++ standard library uses the existing system DSO), but the system libstdc++
in Red hat Enterprise Linux 5 might be too old for that if you need support for current language features.
Given the point of crash, and the fact that preloading libpthread
seems to fix it, I believe that the execution of the two cases diverges at locale_init.cc:315
. Here is an extract of the code:
void locale::_S_initialize() { #ifdef __GTHREADS if (__gthread_active_p()) __gthread_once(&_S_once, _S_initialize_once); #endif if (!_S_classic) _S_initialize_once(); }
__gthread_active_p()
returns true if your program is linked against pthread, specifically it checks if pthread_key_create
is available. On my system, this symbol is defined in "/usr/include/c++/7.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h" as static inline
, hence it is a potential source of ODR violation.
Notice that LD_PRELOAD=libpthread,so
will always cause __gthread_active_p()
to return true.
__gthread_once
is another inlined symbol that should always forward to pthread_once
.
It's hard to guess what's going on without debugging, but I suspect that you are hitting the true branch of __gthread_active_p()
even when it shouldn't, and the program then crashes because there is no pthread_once
to call.
EDIT: So I did some experiments, the only way I see to get a crash in std::locale::_S_initialize
is if __gthread_active_p
returns true, but pthread_once
is not linked in.
libstdc++ does not link directly against pthread
, but it imports half of pthread_xx
as weak objects, which means they can be undefined and not cause a linker error.
Obviously linking pthread will make the crash disappear, but if I am right, the main issue is that your libstdc++
thinks that it is inside a multi-threaded executable even if we did not link pthread in.
Now, __gthread_active_p
uses __pthread_key_create
to decide if we have threads or no. This is defined in your executable as a weak object (can be nullptr and still be fine). I am 99% sure that the symbol is there because of shared_ptr
(remove it and check nm
again to be sure). So, somehow __pthread_key_create
gets bound to a valid address, maybe because of that last -lpthread
in your linker flags. You can verify this theory by putting a breakpoint at locale_init.cc:315
and checking which branch you take.
EDIT2:
Summary of the comments, the issue is only reproducible if we have all of the following:
ld.gold
instead of ld.bfd
--as-needed
__pthread_key_create
, in this case via instantiation of std::shared_ptr
.pthread
, or linking pthread
after --as-needed
.To answer the questions in the comments:
Why does it use gold by default?
By default it uses /usr/bin/ld
, which on most distro is a symlink to either /usr/bin/ld.bfd
or /usr/bin/ld.gold
. Such default can be manipulated using update-alternatives
. I am not sure why in your case it is ld.gold
, as far as I understand RHEL5 ships with ld.bfd
as default.
And why does gold not add pthread.so dependency to the binary if it is needed?
Because the definition of what is needed is somehow shady. man ld
says (emphasis mine):
--as-needed
--no-as-needed
This option affects ELF DT_NEEDED tags for dynamic libraries mentioned on the command line after the --as-needed option. Normally the linker will add a DT_NEEDED tag for each dynamic library mentioned on the command line, regardless of whether the library is actually needed or not. --as-needed causes a DT_NEEDED tag to only be emitted for a library that at that point in the link satisfies a non-weak undefined symbol reference from a regular object file or, if the library is not found in the DT_NEEDED lists of other needed libraries, a non-weak undefined symbol reference from another needed dynamic library. Object files or libraries appearing on the command line after the library in question do not affect whether the library is seen as needed. This is similar to the rules for extraction of object files from archives. --no-as-needed restores the default behaviour.
Now, according to this bug report, gold
is honoring the "non weak undefined symbol" part, while ld.bfd
sees weak symbols as needed. TBH I do not have a full understanding on this, and there is some discussion on that link as to whether this is to be considered a ld.gold
bug, or a libstdc++
bug.
Why do I need to mention -pthread and -lpthread both? (-pthread is passed by default by our build system, and I've pass -lpthread to make it work with gold is used).
-pthread
and -lpthread
do different things (see pthread vs lpthread). It is my understanding that the former should imply the latter.
Regardless, you can probably pass -lpthread
only once, but you need to do it before --as-needed
, or use --no-as-needed
after the last library and before -lpthread
.
It is also worth mentioning that I was not able to reproduce this issue on my system (GCC 7.2), even using the gold linker. So I suspect that it has been fixed in a more recent version libstdc++, which might also explain why it does not segfault if you use the system standard library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With