I recently got myself a dump file generated via procdump when my application remained unresponsive for a while.
When I run !locks on the dump file, I get a lone entry that goes something like:
0:000> !locks
CritSec +123456 at 00123456
WaiterWoken No
LockCount 0
RecursionCount 2
OwningThread aaaa
EntryCount 0
ContentionCount 0
*** Locked
Just that one listing. That's all. When I drill down further with:
0:000> dt RTL_CRITICAL_SECTION 00123456
MyModule!RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0x00aabbcc _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : 0n-2
+0x008 RecursionCount : 0n2
+0x00c OwningThread : 0x0000aaaa Void
+0x010 LockSemaphore : (null)
+0x014 SpinCount : 0
Questions:
1) Should I take MyModule!RTL_CRITICAL_SECTION
as definitive clue this critical section is probably defined inside MyModule?
2) Is there a way to make Windbg display the actual variable name of this critical section? (i.e assuming #1 is true and this is a CS the application code has defined/access to.)
3) Why do the LockCount values in the aforementioned listings at odds with each other? (one has 0 and the other has -2.)
4) I think I understand LockCount well enough to know it can't go below -1. Not to mention RecursionCount seems to be badly out of step with LockCount.
I guess the bottom line question is should I just chalk this up to corrupted CS?
I could easily reproduce the effects of negative LockCount with a sample application, and here are my answers:
Regarding 1)
Should I take MyModule!RTL_CRITICAL_SECTION as definitive clue this critical section is probably defined inside MyModule?
Yes, it's your definition of a critical section, which may or may not correspond to Microsoft's definition. To use Microsoft's definition use dt nt!_RTL_CRITICAL_SECTION
Regarding 2)
Is there a way to make Windbg display the actual variable name of this critical section? (i.e assuming #1 is true and this is a CS the application code has defined/access to.)
Yes, if it's used by a function that's on the call stack. With .frame
, navigate to the frme, and with ?? variableName
display the variable, e.g.
0:000> k L2
# ChildEBP RetAddr
00 0116faa4 00d67419 KERNELBASE!DebugBreak+0x2
01 0116fc5c 00d67ebe CriticalSectionLeaveTwice!main+0x109
0:000> .frame 1
01 0116fc5c 00d67ebe CriticalSectionLeaveTwice!main+0x109 [c:\users\t\documents\visual studio 2015\projects\criticalsectionleavetwice\criticalsectionleavetwice\criticalsectionleavetwice.cpp @ 24]
0:000> ?? CriticalSection
struct _RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0xffffffff _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : 0n-2
+0x008 RecursionCount : 0n2
+0x00c OwningThread : 0x00000d6c Void
+0x010 LockSemaphore : (null)
+0x014 SpinCount : 0x20007d0
Regarding 3)
Why do the LockCount values in the aforementioned listings at odds with each other? (one has 0 and the other has -2.)
The field LockCount
is not a true lock count any more, as explained in
this answer. Relevant part:
In Microsoft Windows Server 2003 Service Pack 1 and later versions of Windows, the LockCount field is parsed as follows:
- The lowest bit shows the lock status. If this bit is 0, the critical section is locked; if it is 1, the critical section is not locked.
- The next bit shows whether a thread has been woken for this lock. If this bit is 0, then a thread has been woken for this lock; if it is 1, no thread has been woken.
- The remaining bits are the ones-complement of the number of threads waiting for the lock.
IMHO the !locks
command should do the interpretation for you.
The particular value of -2
is 11111111 ... 1111110
in binary, so the last bit is 0, meaning that the critical section is locked. The bit before is 1, so no thread has been woken. The one's complement of the remaining is 0, which corresponds to the lock count output of !locks
.
This means, that the critical section you analyzed is not involved in a deadlock.
Regarding 4)
I think I understand LockCount well enough to know it can't go below -1. Not to mention RecursionCount seems to be badly out of step with LockCount.
See #3.
Regarding unnumbered question 5)
should I just chalk this up to corrupted CS?
No. It does not seem to be corrupted.
To check if it's responsible for a deadlock, I suggest using sosex's (download) !dlk
command. Although it's an extension for .NET primarily, I have once requested to make it work for critical sections even without .NET - and the feature was implemented in one of the newer versions of SOSex.
If it could not find .NET, it'll output a warning about that fact and then continue analyzing the critical sections:
In your case it may look like this:
0:000> !dlk
Unable to initialize .NET data interface. The CLR has not yet been loaded in the process (mscorwks/clr module not loaded).
Examining CriticalSections...
No deadlocks detected.
Usage:
.load c:\path\to\sosex.dll
!dlk
If it identifies a deadlock, it's very easy to read. If it does not, you still need to apply other techniques, i.e. it's not a guarantee that there's no deadlock of some other kind (e.g. in case the wait chain includes other types of synchronization objects like threads, events, ...).
Example output (not for critical section, but will be similar):
0:010> !dlk
Deadlock detected:
CLR thread 4 holds sync block 00000000024c6970 OBJ:000000007fff0f80[System.String] STRVAL=SYNC1
waits sync block 00000000024c6928 OBJ:000000007fff0fa8[System.String] STRVAL=SYNC2
CLR thread 5 holds sync block 00000000024c6928 OBJ:000000007fff0fa8[System.String] STRVAL=SYNC2
waits sync block 00000000024c6970 OBJ:000000007fff0f80[System.String] STRVAL=SYNC1
CLR Thread 4 is waiting at ConsoleTestApp.ConsoleTestApp.MonitorDeadlockThreadProc()+0xa4(IL) [C:\dev\ConsoleTestApp\ConsoleTestApp.cs, line 195]
CLR Thread 5 is waiting at ConsoleTestApp.ConsoleTestApp.MonitorDeadlockThreadProc()+0xa4(IL) [C:\dev\ConsoleTestApp\ConsoleTestApp.cs, line 195]
1 deadlock detected.
!analyze -hang
and Debug Diag may help in hang analysis.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With