Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delphi: Debug critical section hang by reporting call stack of running threads on lock "failure"

I'm looking for a way to debug a rare Delphi 7 critical section (TCriticalSection) hang/deadlock. In this case, if a thread is waiting on a critical section for more than say 10 seconds, I'd like to produce a report with the stack trace of both the thread currently locking the critical section and also the thread that failed to be able to lock the critical section after waiting 10 seconds. It is OK then if an exception is raised or the Application terminates.

I would prefer to continue using critical sections, rather than using other synchronization primitives, if possible, but can switch if necessary (such as to get a timeout feature).

If the tool/method works at runtime outside of the IDE, that is a bonus, since this is hard to reproduce on demand. In the rare case I can duplicate the deadlock inside the IDE, if I try to Pause to start debugging, the IDE just sits there doing nothing, and never gets to a state where I can view threads or call stacks. I can Reset the running program, though.

Update: In this case, I'm only dealing with one critical section and 2 threads, so this likely isn't a lock ordering problem. I believe there is an improper nested attempt to enter the lock across two different threads, which results in deadlock.

like image 223
Anagoge Avatar asked Sep 15 '10 16:09

Anagoge


1 Answers

You should create and use your own lock object class. It can be implemented using critical sections or mutexes, depending on whether you want to debug this or not.

Creating your own class has an added benefit: You can implement a locking hierarchy and raise an exception when it is violated. Deadlocks happen when locks are not taken in exactly the same order, every time. Assigning a lock level to each lock makes it possible to check that the locks are taken in the correct order. You could store the current lock level in a threadvar, and allow only locks to be taken that have a lower lock level, otherwise you raise an exception. This will catch all violations, even when no deadlock happens, so it should speed up your debugging a lot.

As for getting the stack trace of the threads, there are many questions here on Stack Overflow dealing with this.

Update

You write:

In this case, I'm only dealing with one critical section and 2 threads, so this likely isn't a lock ordering problem. I believe there is an improper nested attempt to enter the lock across two different threads, which results in deadlock.

That can't be the whole story. There's no way to deadlock with two threads and a single critical section alone on Windows, because critical sections can be acquired there recursively by a thread. There has to be another blocking mechanism involved, like for example the SendMessage() call.

But if you really are dealing with two threads only, then one of them has to be the main / VCL / GUI thread. In that case you should be able to use the MadExcept "Main thread freeze checking" feature. It will try to send a message to the main thread, and fail after a customizable time has elapsed without the message being handled. If your main thread is blocking on the critical section, and the other thread is blocking on a message handling call then MadExcept should be able to catch this and give you a stack trace for both threads.

like image 180
mghie Avatar answered Sep 25 '22 19:09

mghie