I have a multi-threaded C++ program that deadlocks in some rare cases. The problem is hard to reproduce and I can only reproduce it in a remote machine. The method I want to use for solving this problem is
I do not have gdb on the remote machine and cannot install anything on it. The problem is when I am debugging the core dump (obtained from either a dead-locked or normally running process on the remote machine), the back-trace of most of the threads show only:
(gdb) bt #0 pthread_cond_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:261 #1 0x0000000000000000 in ?? ()
I am using a statically linked binary which is compiled with "-g -O1" options. When I abort a process of the same binary on my local machine, gdb can extract the entire stack from core dump and there is no such problem (I cannot reproduce the deadlock however). My remote machine is SLES and my local machine is ubuntu.
Any idea?
Edit:
Found someone else with the same problem, but still with no solutions: http://groups.google.com/group/google-coredumper/browse_thread/thread/2ca9bcf9465d1050 (I am not using google coredumper, but it seems like google coredumper fails with the same error, this suggests that perhaps the problem is with SLES 11)
Note that you can also use gcore to create a core file without aborting. Have you tried running pstack on the remote host (assuming it's installed) to see if you can get a backtrace that way?
Otherwise, if the shared objects used by your application are different on your local host and remote host, gdb won't be able to match the memory offsets properly and the backtrace will probably get all confused. If you're able to copy all the relevant .so
files from the remote host to some place locally I believe you can direct gdb to read from them instead of the normally installed versions.
EDIT: try running pstack on your build machine and see if it can pick up a stack.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With