Distributed software debug with gdb

Tags:

I am currently developing a distributed software in C++ using linux which is executed in more than 20 nodes simultaneously. So one of the most challenging issue that I found is how to debug it.

I heard that is possible to manage in a single gdb session multiple remote sessions (e.g. in the master node I create the gdb session and in every other node I launch the program using gdbserver), is it possible? If so can you give an example? Do you know any other way to do it?

Thanks

494

asked Feb 09 '14 07:02

Vicente Bolea

2 Answers

You can try to do it like this:

First start nodes with gdbserver on remote hosts. It is even possible to start it without a program to debug, if you start it with --multi flag. When server is in multi mode, you can control it from your local session, I mean that you can make it start a program you want to debug. Then, start multiple inferiors in your gdb session

gdb> add-inferior -copies <number of servers>

switch them to a remote target and connect them to remote servers

gdb> inferior 1
gdb> target extended-remote host:port // use extended to switch gdbserver to multi mode
// start a program if gdbserver was started in multi mode
gdb> inferior 2
...

Now you have them all attached to one gdb session. The problem is that, AFAIK, it is not much better than to start multiple gdb's from different console tabs. On the other hand you can write some scripts or auto tests this way. See the gdb tutorial: server and inferiors.

151

answered Oct 19 '22 21:10

Pavel Davydov

I don't believe there is one, simple, answer to debugging "many remote applications". Yes, you can attach to a process on another machine, and step through it in GDB. But it's quite awkward to debug a large number of interdependent processes, especially when the problem is complicated.

I believe a good set of logging capabilities in the code, supplemented with additional logs for specific debugging as needed, is more likely to give you a good/fast result.

Another option might be to run the processes on one machine, rather than on multiple machines. Perhaps even use threads within one process, to simulate the behaviour of multiple machines, simplifying the debugging process. Of course, this doesn't prevent bugs that appear ONLY when you run 20 processes on 20 different machines. But the basic idea is to reduce the number of those bugs to a minimum, and debug most things in a "simpler environment".

Aggressive use of defensive programming paradigms, such as liberal use of assert is clearly a good idea (perhaps with a macro to turn it off for the production runs, but make sure that you don't just leave error paths completely unchecked - it is MUCH harder to detect that the reason something crashes is that a memory allocation failed than to track down where that NULL pointer came from some 20 function calls away from a failed allocation.

answered Oct 19 '22 21:10

Mats Petersson

Related questions
                            
                                Can the compiler implicitly std::move arguments when referenced just once?
                            
                                ui header file (ui_name.h) not found
                            
                                How to wait for an asio handler?
                            
                                c++ switch vs. member function pointer vs. virtual inheritance
                            
                                Does static polymorphism make sense for implementing an interface?
                            
                                Why does my colon character disappear when I go from char[] to string?
                            
                                Qt: How to catch an error with system call?
                            
                                Is it possible for multiple Dynamic Link Libraries (DLL) to share Thread Local Storage from a Static Library (LIB)
                            
                                c++ set data structure which keeps inserted order
                            
                                Is there any reason to make a template template parameter non variadic?
                            
                                Where are round(), rint() (and so on) C++ functions in Visual Studio 2012?
                            
                                Open Source Projects For Students [closed]
                            
                                What's the relationship between binutils and gcc?
                            
                                Define parsers parameterized with sub-parsers in Boost Spirit
                            
                                Convert rcpp variables into standard C++ variables
                            
                                Should I use condition variables from the C++ standard or from the Windows API?
                            
                                Why does assert not work here?
                            
                                C++ High performance unit testing with Google Mock?
                            
                                LogonUser() not authenticating user for invalid domain when computer is not on a domain
                            
                                Given number N eliminate K digits to get maximum possible number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Distributed software debug with gdb

Tags:

c++

linux

parallel-processing

gdb

distributed-computing

Vicente Bolea

People also ask

2 Answers

Pavel Davydov

Mats Petersson

Recent Activity

Donate For Us