Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to debug a traceless crash

During development of out application we have in special encountered a really nasty bug. The symptom is quite simply that the process disappears. The logs just end abruptly, no crash dumps or anything can be found, no zombie processes exist. Dr.Watson haven't noticed anything leaving us without any trace.

The error is not simple to reproduce, it takes on average 3-4 hours to reproduce this error, doing the same actions repeatedly. So somewhere there is some kind of race condition. We have special functions handling both SEH and normal exceptions so none of these should go unnoticed.

The debugging must be done on a special computer, because it is running on very specialized hardware. So only remote debugging is available. And when remote debugging is connected C++ builder doesn't noticed that the application is missing, and crash and burns when we try to do any debugging on the non existent process.

We are using a great variety of technologies with this software:

  • OpenGL
  • Directshow + some COTS filters
  • COM/DCOM
  • Serial COM ports talking to specialized hardware
  • C++ Builder (which uses different stackframes than VC++)

So, as you understand, I do not have much to work with here. What I am doing now is that I am trying to narrow it down by logging in different places in the code to find if there is some particular point in the code the error occurs. I am also trying to remove as many aspects of the action I am performing to get the case as simple as possible. But this is a really complex operation and this process is taking a lot of time, and time is (as usual) a scarce resource.

I am wondering if anyone out there have good tips for me, either to the cause (in general what causes a process just to stop without any notification) or to techniques for debugging such an elusive failure?

like image 795
daramarak Avatar asked Dec 17 '22 16:12

daramarak


1 Answers

When native code under Windows experiences a stack overflow (typically due to infinite recursion) the process sometimes disappears exactly as you describe. The standard error dialogs and exception handling require some stack space, and where there is none left they cannot run. (Later versions of Windows handle this better and should always raise an exception - Windows XP is not "later" under this definition.)

The easiest brute-force way to debug this is to write log messages at the entry (and maybe the exit) to each function. These messages have to go directly to a file, and if you have buffered output (eg. cout or similar) you should flush it immediately each time. When you manage to cause the crash, you'll have close to a stack trace that can at least localise the issue.


Infinite recursion is not the only cause of a stack overflow (though it is the more common one). If very large variables (typically arrays with thousands/millions of elements) are allocated on the stack the same issue may occur. In particular, the alloca() "function" can disguise the cause of this type of stack overflow.

If you run under a debugger and break/log on guard page exceptions you will be notified when the stack is expanding - let the exception be handled, since it is being used to commit more memory and may not actually be related to the issue.


The final non-stack-overflow cause of a disappearing process is a stray call to exit() or ExitProcess(). A full text search should be able to mostly rule this out - a breakpoint on the ExitProcess function in a debugger will do so completely.

like image 141
Zooba Avatar answered Dec 24 '22 05:12

Zooba