When a stack corruption is detected, one should look at the local variables in the called and calling functions to look for possible sources of memory corruption. Check array and pointer declarations for sources of errors. Sometimes stray corruption of a processors registers might also be due to a stack corruption.
Selects a stack frame or displays the currently selected stack frame.
Able to view and traverse the function call stack using the where, up, down and frame commands. In order to debug programs with functions (i.e. most programs), it is helpful to inspect the variables of all functions in the current call stack, i.e. the functions called to get to the current point in the program.
Those bogus adresses (0x00000002 and the like) are actually PC values, not SP values. Now, when you get this kind of SEGV, with a bogus (very small) PC address, 99% of the time it's due to calling through a bogus function pointer. Note that virtual calls in C++ are implemented via function pointers, so any problem with a virtual call can manifest in the same way.
An indirect call instruction just pushes the PC after the call onto the stack and then sets the PC to the target value (bogus in this case), so if this is what happened, you can easily undo it by manually popping the PC off the stack. In 32-bit x86 code you just do:
(gdb) set $pc = *(void **)$esp
(gdb) set $esp = $esp + 4
With 64-bit x86 code you need
(gdb) set $pc = *(void **)$rsp
(gdb) set $rsp = $rsp + 8
Then, you should be able to do a bt
and figure out where the code really is.
The other 1% of the time, the error will be due to overwriting the stack, usually by overflowing an array stored on the stack. In this case, you might be able to get more clarity on the situation by using a tool like valgrind
If the situation is fairly simple, Chris Dodd's answer is the best one. It does look like it jumped through a NULL pointer.
However, it is possible the program shot itself in the foot, knee, neck, and eye before crashing—overwrote the stack, messed up the frame pointer, and other evils. If so, then unraveling the hash is not likely to show you potatoes and meat.
The more efficient solution will be to run the program under the debugger, and step over functions until the program crashes. Once a crashing function is identified, start again and step into that function and determine which function it calls causes the crash. Repeat until you find the single offending line of code. 75% of the time, the fix will then be obvious.
In the other 25% of situations, the so-called offending line of code is a red herring. It will be reacting to (invalid) conditions set up many lines before—maybe thousands of lines before. If that is the case, the best course chosen depends on many factors: mostly your understanding of the code and experience with it:
printf
's on critical variables will lead to the necessary A ha!
Good luck!
Assuming that the stack pointer is valid...
It may be impossible to know exactly where the SEGV occurs from the backtrace -- I think the first two stack frames are completely overwritten. 0xbffff284 seems like a valid address, but the next two aren't. For a closer look at the stack, you can try the following:
gdb$ x/32ga $rsp
or a variant (replace the 32 with another number). That will print out some number of words (32) starting from the stack pointer of giant (g) size, formatted as addresses (a). Type 'help x' for more info on format.
Instrumenting your code with some sentinel 'printf''s may not be a bad idea, in this case.
Look at some of your other registers to see if one of them has the stack pointer cached in them. From there, you might be able to retrieve a stack. Also, if this is embedded, quite often stack is defined at a very particular address. Using that, you can also sometimes get a decent stack. This all assumes that when you jumped to hyperspace, your program didn't puke all over memory along the way...
If it's a stack overwrite, the values may well correspond to something recognisable from the program.
For example, I just found myself looking at the stack
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x000000000000342d in ?? ()
#2 0x0000000000000000 in ?? ()
and 0x342d
is 13357, which turned out to be a node-id when I grepped the application logs for it. That immediately helped narrow down candidate sites where the stack overwrite might have occurred.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With