Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting application hang

Tags:

c++

windows

I have a very large, complex (million+ LOC) Windows application written in C++. We receive a handful of reports every day that the application has locked up, and must be forcefully shut down.

While we have extensive reporting about crashes in place, I would like to expand this to include these hang scenarios -- even with heavy logging in place, we have not been able to track down root causes for some of these. We can clearly see where activity stopped - but not why it stopped, even in evaluating output of all threads.

The problem is detecting when a hang occurs. So far, the best I can come up with is a watchdog thread (as we have evidence that background threads are continuing to run w/out issues) which periodically pings the main window with a custom message, and confirms that it is handled in a timely fashion. This would only capture GUI thread hangs, but this does seem to be where the majority of them are occurring. If a reply was not received within a configurable time frame, we would capture a memory and stack dump, and give the user the option of continuing to wait or restarting the app.

Does anyone know of a better way to do this than such a periodic polling of the main window in this way? It seems painfully clumsy, but I have not seen alternatives that will work on our platforms -- Windows XP, and Windows 2003 Server. I see that Vista has much better tools for this, but unfortunately that won't help us.

Suffice it to say that we have done extensive diagnostics on this and have been met with only limited success. Note that attaching windbg in real-time is not an option, as we don't get the reports until hours or days after the incident. We would be able to retrieve a memory dump and log files, but nothing more.

Any suggestions beyond what I'm planning above would be appreciated.

like image 355
Marc Paradise Avatar asked Dec 16 '09 21:12

Marc Paradise


2 Answers

The answer is simple: SendMessageTimeout!

Using this API you can send a message to a window and wait for a timeout before continuing; if the application responds before timeout the is still running otherwise it is hung.

like image 131
Kristina Brooks Avatar answered Sep 21 '22 16:09

Kristina Brooks


One option is to run your program under your own "debugger" all the time. Some programs, such as GetRight, do this for copy protection, but you can also do it to detect hangs. Essentially, you include in your program some code to attach to a process via the debugging API and then use that API to periodically check for hangs. When the program first starts, it checks if there's a debugger attached to it and, if not, it runs another copy of itself and attaches to it - so the first instance does nothing but act as the debugger and the second instance is the "real" one.

How you actually check for hangs is another whole question, but having access to the debugging API there should be some way to check reasonably efficiently whether the stack has changed or not (ie. without loading all the symbols). Still, you might only need to do this every few minutes or so, so even if it's not efficient it might be OK.

It's a somewhat extreme solution, but should be effective. It would also be quite easy to turn this behaviour on and off - a command-line switch will do or a #define if you prefer. I'm sure there's some code out there that does things like this already, so you probably don't have to do it from scratch.

like image 20
EMP Avatar answered Sep 17 '22 16:09

EMP