Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my thread blocked by a critical section not being held by anything?

I am having an issue with a critical section in C++. I'm getting a hung window and when I dump the process I can see the thread waiting on a critical section:

  16  Id: b10.b88 Suspend: 1 Teb: 7ffae000 Unfrozen
ChildEBP RetAddr  
0470f158 7c90df3c ntdll!KiFastSystemCallRet
0470f15c 7c91b22b ntdll!NtWaitForSingleObject+0xc
0470f1e4 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0470f1ec 0415647e ntdll!RtlEnterCriticalSection+0x46

The line data, etc, all indicates entry into a specific critical section. The only problem is that no other threads appear to be holding this critical section open. There's nothing indicated by Windbg's !locks command and dumping the critical section indicates it's not locked as can be seen by the null owner and the -1 LockCount in the structure below.

0:016> dt _RTL_CRITICAL_SECTION 42c2318
_RTL_CRITICAL_SECTION
   +0x000 DebugInfo        : 0x02c8b318 _RTL_CRITICAL_SECTION_DEBUG
   +0x004 LockCount        : -1
   +0x008 RecursionCount   : -1
   +0x00c OwningThread     : (null) 
   +0x010 LockSemaphore    : 0x00000340 
   +0x014 SpinCount        : 0

0:016> dt _RTL_CRITICAL_SECTION_DEBUG 2c8b318
_RTL_CRITICAL_SECTION_DEBUG
   +0x000 Type             : 0
   +0x002 CreatorBackTraceIndex : 0x2911
   +0x004 CriticalSection  : 0x042c2318 _RTL_CRITICAL_SECTION
   +0x008 ProcessLocksList : _LIST_ENTRY [ 0x2c8b358 - 0x2c8b2e8 ]
   +0x010 EntryCount       : 1
   +0x014 ContentionCount  : 1
   +0x018 Flags            : 0xbaadf00d
   +0x01c CreatorBackTraceIndexHigh : 0xf00d
   +0x01e SpareWORD        : 0xbaad

How is this possible? Even in a deadlock where another thread has not called LeaveCriticalSection I would expect to see the critical section itself marked as locked. Does anyone have any debugging suggestions or possible fixes?

like image 292
dlanod Avatar asked Jan 12 '12 04:01

dlanod


1 Answers

It turned out to be a bug where LeaveCriticalSection was being called without a corresponding EnterCriticalSection. This caused the critical section to decrement LockCount and RecursionCount into the following state (the default for LockCount is -1 and RecursionCount is 0):

0:016> dt _RTL_CRITICAL_SECTION 1092318
_RTL_CRITICAL_SECTION
    +0x000 DebugInfo        : 0x....... _RTL_CRITICAL_SECTION_DEBUG
    +0x004 LockCount        : -2
    +0x008 RecursionCount   : -1
    +0x00c OwningThread     : (null)
    +0x010 LockSemaphore    : 0x....... 
    +0x014 SpinCount        : 0 

When the subsequent EnterCriticalSection was performed, it hung because RecursionCount was non-zero - a thread can only take ownership of the critical section if RecursionCount is 0. However it did increment LockCount (taking it back to the -1 seen in my original question) just to confuse matters.

In summary if you see a critical section halting your thread with both LockCount and RecursionCount of -1, it means there was excessive unlocking.

As to the code causing it:

if (SysStringLen(bstrState) > 0)
    CHECKHR_CS( m_pStateManager->SetState(bstrState), &m_csStateManagerLock );

And the definition of the error-checking macro:

#define CHECKHR_CS(x, cs)                       \
    EnterCriticalSection(cs);                       \
    if( FAILED(hr = (x)) ) {                        \
        LeaveCriticalSection(cs);                   \
        return hr;                          \
    }                           \
    LeaveCriticalSection(cs);

The macro lacks curly braces around its contents, so the if statement not being satisfied only skips EnterCriticalSection. Obviously a problem.

like image 93
dlanod Avatar answered Sep 29 '22 13:09

dlanod