Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"DebugInfo for CritSec does not point back to the critical section" when analysing deadlock

I'm using Windbg to analyse a deadlock occurring in an data-snap application server written in delphi.

When I run

!analyze -hang -v

I get this

:000:x86> !analyze -hang -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

GetPageUrlData failed, server returned HTTP status 404
URL requested: http://watson.microsoft.com/00000000.htm?Retriage=1

FAULTING_IP: 
+6ced240
00000000 ??              ???

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 0000000000000000
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 0

FAULTING_THREAD:  0000000000000000

BUGCHECK_STR:  HANG

DEFAULT_BUCKET_ID:  APPLICATION_HANG

PROCESS_NAME:  ********.exe

ERROR_CODE: (NTSTATUS) 0xcfffffff - 

EXCEPTION_CODE: (NTSTATUS) 0xcfffffff - 

MOD_LIST: 

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

DERIVED_WAIT_CHAIN:  

Dl Eid Cid     WaitType
-- --- ------- --------------------------
   0   c7c.2634 Critical Section       

WAIT_CHAIN_COMMAND:  ~0s;k;;

BLOCKING_THREAD:  0000000000002634

PRIMARY_PROBLEM_CLASS:  APPLICATION_HANG

LAST_CONTROL_TRANSFER:  from 0000000077138df4 to 000000007711f8b1

STACK_TEXT:  
0018fc50 77138df4 00000c6c 00000000 00000000 ntdll_77100000!NtWaitForSingleObject+0x15
0018fcb4 77138cd8 00000000 00000000 03fe0940 ntdll_77100000!RtlpWaitOnCriticalSection+0x13e
0018fcdc 7369324f 736a3134 00000000 03fe0940 ntdll_77100000!RtlEnterCriticalSection+0x150
WARNING: Stack unwind information not available. Following frames may be wrong.
0018fcec 7369af5f 00000388 00000000 003d1e00 mswsock!GetLspGuid+0x19af
0018fd08 76366958 00000388 0018fd84 0018fd9c mswsock!GetLspGuid+0x96bf
0018fd38 0018fd58 763668cd 00000388 0018fd84 ws2_32!WSAAccept+0x84
00000000 00000000 00000000 00000000 00000000 0x18fd58


FOLLOWUP_IP: 
mswsock!GetLspGuid+19af
7369324f 33db            xor     ebx,ebx

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  mswsock!GetLspGuid+19af

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: C:\Windows\System32\mswsock

IMAGE_NAME:  lld

DEBUG_FLR_IMAGE_TIMESTAMP:  4ce7c83d

STACK_COMMAND:  ~0s ; kb

FAILURE_BUCKET_ID:  APPLICATION_HANG_cfffffff_lld!Unloaded

BUCKET_ID:  X64_HANG_mswsock!GetLspGuid+19af

WATSON_STAGEONE_URL:  http://watson.microsoft.com/00000000.htm?Retriage=1

Followup: MachineOwner
---------

I then did

!locks -V

to see which lock it was waiting on and to my surprise it returned this,

0:000:x86> !locks -V

CritSec ntdll!RtlCriticalSectionLock+0 at 0000000077057060
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!LdrpLoaderLock+0 at 0000000077057490
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!RtlpDynamicFunctionTableLock+0 at 0000000077057468
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!FastPebLock+0 at 000000007705a900
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!RtlpProcessHeapsListLock+0 at 000000007705a240
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +270208 at 0000000000270208
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    1

CritSec ntdll!EtwProvCritSect+0 at 000000007705a120
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec ntdll!EtwPrivSessionCritSect+0 at 000000007705a1e0
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +10208 at 0000000000010208
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

CritSec +276f40 at 0000000000276f40
LockCount          NOT LOCKED
RecursionCount     0
OwningThread       0
EntryCount         0
ContentionCount    0

Scanned 10 critical sections

From looking at the call stack

STACK_TEXT:  
0018fc50 77138df4 00000c6c 00000000 00000000 ntdll_77100000!NtWaitForSingleObject+0x15
0018fcb4 77138cd8 00000000 00000000 03fe0940 ntdll_77100000!RtlpWaitOnCriticalSection+0x13e
0018fcdc 7369324f 736a3134 00000000 03fe0940 ntdll_77100000!RtlEnterCriticalSection+0x150
WARNING: Stack unwind information not available. Following frames may be wrong.
0018fcec 7369af5f 00000388 00000000 003d1e00 mswsock!GetLspGuid+0x19af
0018fd08 76366958 00000388 0018fd84 0018fd9c mswsock!GetLspGuid+0x96bf
0018fd38 0018fd58 763668cd 00000388 0018fd84 ws2_32!WSAAccept+0x84
00000000 00000000 00000000 00000000 00000000 0x18fd58

I determined it was waiting on a critical section at address 0x736a3134 (First parameter passed to RtlEnterCriticalSection) so I ran this

!critsec 736a3134

That gave me this output

0:000:x86> !critsec 736a3134

DebugInfo for CritSec at 00000000736a3134 does not point back to the critical section
NOT an initialized critical section.

CritSec mswsock!WSPStartup+6f64 at 00000000736a3134
WaiterWoken        Yes
LockCount          -1
RecursionCount     11028
OwningThread       c6c
EntryCount         1f49dad6
ContentionCount    88000000
*** Locked

Now the penny dropped, the pointer to the critical section has become corrupted, possibly due to concurrent thread access and lack of synchronisation elsewhere in the code

My question is how do I track down where this is or find out if it is another problem?

PS: this bug only appears when the application is under heavy load with maybe 700 clients connected

(it is using one thread per connection and I know 32bit applications will be limited to aprox 2000 threads at the default thread stack size and this is not the best approach)

PPS: I have multiple crash dumps where the application is hung waiting on different critical sections, in each case the pointer for the critical section appears not to point to a critical section.

like image 660
Mike Taylor Avatar asked Oct 23 '22 11:10

Mike Taylor


2 Answers

Looking at the output of !analyze -hang -v, it seems you are not using Application Verifier. I would recommend you to collect a hang dump after enabling application verifier "Locks" option. It would certainly give you more information for troubleshooting.

You can download Application Verifier from here:

http://www.microsoft.com/en-us/download/details.aspx?id=20028

More information:

http://msdn.microsoft.com/en-us/library/windows/desktop/dd371695(v=vs.85).aspx

like image 127
IUnknown Avatar answered Oct 27 '22 10:10

IUnknown


Just to let you know we gave up trying to find out what was causing this. As it only happened when the program was close to its Max virtual memory space (2.1GB 32bit app) due to the one thread per connection approach we where using.

In the end we re-designed the clients so they are not using this server application any-more but now use a SOAP server instead.

The SOAP server seams to scale much better than the datasnap/Midas we where using although we are still to test it at the clients site where the initial problem appeared.

like image 43
Mike Taylor Avatar answered Oct 27 '22 09:10

Mike Taylor