Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET application hangs with GC thread deadlock

We have a problem with our application that is using a mixture of managed (C#) and unmanaged (C++) code. Basically we have a exe that invokes a bunch of assemblies and one of these assemblies is a MC++ wrapper of our C++ library. The application is a console app. Most of the time it work fine but occasionally it hangs without any errors or exceptions.

Using memory dumps and symbols we've been able to do some diagnosis in WinDbg but I'm not really sure what we are seeing is a deadlock or not. I've searched for the CLR method names that come up in the stack but haven't been able to find cases where one thread is trying to allocate memory and gets deadlocked with GC.

So far I've tried WinDbg with sos, sosex, psscor4 extensions. Intrestingly sosex has a command to check for deadlocks (!dlk) but it reports no deadlocks.

It's hard to post the code because it's a large and complex app. There is mixture of .NET 3.5 and 4.0 assemblies. There are threads in both managed and unmanaged code.

I would appricate if someone could look at the stack traces and confirm that this is a possible deadlock with GC thread. Or even better if you can suggest some other way of debugging deadlocks/hangs in .NET apps that use C# and MC++.

Here's what I have so far:

List of threads when the app hangs: (!threads)

ThreadCount:      8
UnstartedThread:  0
BackgroundThread: 5
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                           PreEmptive                                                   Lock
       ID  OSID        ThreadOBJ     State GC       GC Alloc Context                  Domain           Count APT Exception
   0    1   de0 00000000008069f0      a020 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 MTA
   2    2  2130 000000000080bd30      b220 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 MTA (Finalizer)
   4    3  14fc 000000001d182880   200b020 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 MTA
   5    4  20d0 000000001d18b400      b220 Enabled  0000000000000000:0000000000000000 00000000007fa280     2 MTA (GC)
   6    5  18a8 000000001d19f6a0      b020 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 MTA
   7    6  18a0 000000001d1c6f10       220 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 Ukn
   8    7  12f4 000000001d1c1ee0       220 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 Ukn
  10    8  2170 000000001d1c2ad0       220 Enabled  0000000000000000:0000000000000000 00000000007fa280     0 Ukn

       OSID     Special thread type
    1   2570    DbgHelper 
    2   2130    Finalizer 
    5   20d0    SuspendEE 
   12   1890    GC

This is what the stack of the GC thread looks like:

OS Thread Id: 0x1890 (12)
Child-SP         RetAddr          Call Site
0000000023e9f898 000000007799e4e8 ntdll!ZwWaitForSingleObject+0xa
0000000023e9f8a0 000000007799e3db ntdll!RtlpWaitOnCriticalSection+0xe8
0000000023e9f950 000007fef95d603e ntdll!RtlEnterCriticalSection+0xd1
0000000023e9f980 000007fef947bc41 clr!UnsafeEEEnterCriticalSection+0x1f
0000000023e9f9b0 000007fef947613a clr!CrstBase::Enter+0x1a1
0000000023e9f9f0 000007fef95da3a2 clr!ThreadStore::LockThreadStore+0x9a
0000000023e9fa20 000007fef9679675 clr!WKS::GCHeap::SuspendEE+0x82
0000000023e9fb20 000007fef9677eb2 clr!WKS::gc_heap::bgc_suspend_EE+0x25
0000000023e9fb50 000007fef98455b0 clr!WKS::gc_heap::background_mark_phase+0x236
0000000023e9fbb0 000007fef9677b76 clr! ?? ::FNODOBFM::`string'+0x9f85d
0000000023e9fc00 00000000773d652d clr!WKS::gc_heap::gc_thread_function+0xd3
0000000023e9fc30 000000007797c521 KERNEL32!BaseThreadInitThunk+0xd
0000000023e9fc60 0000000000000000 ntdll!RtlUserThreadStart+0x1d

To me it looks like the GC thread is waiting for the Critical Section. We were able to find the Critical Section address and then find the owner thread for it (!critsec). The stack for the owner thread looked something like below. I've trimmed it to keep it short for this post. (!dumpstack)

OS Thread Id: 0x20d0 (5)
Child-SP         RetAddr          Call Site
000000001fc5dd38 000007fefe0510dc ntdll!ZwWaitForSingleObject+0xa
000000001fc5dd40 000007fef9478817 KERNELBASE!WaitForSingleObjectEx+0x79
000000001fc5dde0 000007fef94787c0 clr!CLREvent::WaitEx+0x170
000000001fc5de20 000007fef947866b clr!CLREvent::WaitEx+0xf8
000000001fc5de80 000007fef967a15b clr!CLREvent::WaitEx+0x5e
000000001fc5df20 000007fef967a001 clr!WKS::gc_heap::user_thread_wait+0x49
000000001fc5df50 000007fef95dbb4e clr! ?? ::FNODOBFM::`string'+0x9fcc4
000000001fc5e030 000007fef95da22e clr!WKS::GCHeap::GarbageCollectGeneration+0x14e
000000001fc5e080 000007fef95d9e4e clr!WKS::gc_heap::try_allocate_more_space+0x25f
000000001fc5e150 000007fef95d9fc8 clr!WKS::GCHeap::Alloc+0x7e
000000001fc5e180 000007fef947407c clr!AllocateArrayEx+0xa6b
000000001fc5e2f0 000007fef8555b75 clr!JIT_NewArr1+0x45c
000000001fc5e4c0 000007fef8561103 mscorlib_ni!System.Reflection.CustomAttributeData.GetCustomAttributeRecords(System.Reflection.RuntimeModule, Int32)+0x115
000000001fc5e590 000007fef855db55 mscorlib_ni!System.Reflection.CustomAttribute.IsCustomAttributeDefined(System.Reflection.RuntimeModule, Int32, System.RuntimeType, Boolean)+0x103
000000001fc5e720 000007fef856c8ac mscorlib_ni!System.Reflection.CustomAttribute.IsDefined(System.RuntimeType, System.RuntimeType, Boolean)+0x75
000000001fc5e770 000007fef857fe46 mscorlib_ni!System.Enum.InternalFormat(System.RuntimeType, System.Object)+0x2c
000000001fc5e7b0 000007fef8554f3b mscorlib_ni!System.Text.StringBuilder.AppendFormat(System.IFormatProvider, System.String, System.Object[])+0x2e6
000000001fc5e850 000007ff03c640fc mscorlib_ni!System.String.Format(System.IFormatProvider, System.String, System.Object[])+0x7b
000000001fc5e8b0 000007ff03c638a6 MyLibrary1!NumberCache.NumberEntry.ToString()+0x26c
like image 226
user1210698 Avatar asked Feb 15 '12 07:02

user1210698


1 Answers

This line in the second callstack looks suspicious:

000000001fc5df50 000007fef95dbb4e clr! ?? ::FNODOBFM::`string'+0x9fcc4 

Look how large the offset address is, and I don't see any module name -- are you missing some symbols?

Maybe there is a finalizer in that library that is causing a problem.

like image 159
JMarsch Avatar answered Sep 22 '22 15:09

JMarsch