Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to debug corruption in the managed heap

My program throws an error which it cannot handle by a catch(Exception e) block and then it crashes:

Access Violation Corrupted State Exception.

This is the weird thing, because, as I know, corrupted state exceptions are thrown from unmanaged code, while here I get this exception while calling a StringBuilder method.

The code runs in a background thread and crashes from time to time which cannot be easily reproduced. So I attached WinDbg to the process and have the following stack of the exception:

000000001dabd8c8 000007feea129a1d [HelperMethodFrame: 000000001dabd8c8] 000000001dabda00 000007fee90cfce8 System.Text.StringBuilder.ExpandByABlock(Int32) 000000001dabda40 000007fee90cfba4 System.Text.StringBuilder.Append(Char*, Int32) 000000001dabdaa0 000007fee9102955 System.Text.StringBuilder.Append(System.String, Int32, Int32) 000000001dabdaf0 000007ff00bf5ce3 MineUtils.Common.Strings.Strings.Replace(System.String, System.String, System.String, Boolean, Boolean) 000000001dabdb90 000007ff00bf5a59 MineUtils.Common.Strings.Strings.RemoveSubstrings(System.String, System.String, System.String, Boolean) [D:\Programs\Visual Studio 2005 Projects\MineUtils.Common\Strings\Strings.Common-Main.cs @ 1481 

WinDbg shows this exception occurred:

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 000007feea129a1d (clr!WKS::gc_heap::find_first_object+0x0000000000000092)    ExceptionCode: c0000005 (Access violation)   ExceptionFlags: 00000000 NumberParameters: 2    Parameter[0]: 0000000000000000    Parameter[1]: 0000000000003d80 Attempt to read from address 0000000000003d80 

I read such exceptions can be handled with a method attribute [HandleProcessCorruptedStateExceptions], but why does this exception ever occur if I only use StringBuilder?

This is the previous WinDbg analysis (StringBuilder.ToString() causes the exception):

******************************************************************************* *                                                                             * *                        Exception Analysis                                   * *                                                                             * *******************************************************************************  FAULTING_IP: clr!WKS::gc_heap::find_first_object+92 000007fe`ea129a1d f70100000080    test    dword ptr [rcx],80000000h  EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 000007feea129a1d (clr!WKS::gc_heap::find_first_object+0x0000000000000092)    ExceptionCode: c0000005 (Access violation)   ExceptionFlags: 00000001 NumberParameters: 2    Parameter[0]: 0000000000000000    Parameter[1]: 0000000000001c98 Attempt to read from address 0000000000001c98  ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.  EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.  EXCEPTION_PARAMETER1:  0000000000000000  EXCEPTION_PARAMETER2:  0000000000001c98  READ_ADDRESS:  0000000000001c98  FOLLOWUP_IP: clr!WKS::gc_heap::find_first_object+92 000007fe`ea129a1d f70100000080    test    dword ptr [rcx],80000000h  MOD_LIST: <ANALYSIS/>  NTGLOBALFLAG:  0  APPLICATION_VERIFIER_FLAGS:  0  MANAGED_STACK: (TransitionMU) 000000001AB7DFC0 000007FEE90CFE07 mscorlib_ni!System.Text.StringBuilder.ToString()+0x27 000000001AB7E010 000007FF00C750A9 SgmlReaderDll!Sgml.Entity.ScanToken(System.Text.StringBuilder, System.String, Boolean)+0x169 000000001AB7E080 000007FF00C760E6 SgmlReaderDll!Sgml.SgmlDtd.ParseParameterEntity(System.String)+0xc6 000000001AB7E0F0 000007FF00C76FD8 SgmlReaderDll!Sgml.SgmlDtd.ParseModel(Char, Sgml.ContentModel)+0x298 000000001AB7E160 000007FF00C7701C SgmlReaderDll!Sgml.SgmlDtd.ParseModel(Char, Sgml.ContentModel)+0x2dc 000000001AB7E1D0 000007FF00C7701C SgmlReaderDll!Sgml.SgmlDtd.ParseModel(Char, Sgml.ContentModel)+0x2dc 000000001AB7E240 000007FF00C76BA5 SgmlReaderDll!Sgml.SgmlDtd.ParseContentModel(Char)+0x65 000000001AB7E290 000007FF00C763D7 SgmlReaderDll!Sgml.SgmlDtd.ParseElementDecl()+0xe7 000000001AB7E320 000007FF00C747A1 SgmlReaderDll!Sgml.SgmlDtd.Parse()+0xc1 000000001AB7E370 000007FF00C73EF5 SgmlReaderDll!Sgml.SgmlDtd.Parse(System.Uri, System.String, System.IO.TextReader, System.String, System.String, System.Xml.XmlNameTable)+0x175 000000001AB7E410 000007FF00C73B33 SgmlReaderDll!Sgml.SgmlReader.LazyLoadDtd(System.Uri)+0x163 000000001AB7E480 000007FF00C737B9 SgmlReaderDll!Sgml.SgmlReader.OpenInput()+0x19 000000001AB7E4E0 000007FF00C7334C SgmlReaderDll!Sgml.SgmlReader.Read()+0x1c 000000001AB7E530 000007FEE5983C4C System_Xml_ni!System.Xml.XmlLoader.Load(System.Xml.XmlDocument, System.Xml.XmlReader, Boolean)+0xac 000000001AB7E590 000007FEE5983730 System_Xml_ni!System.Xml.XmlDocument.Load(System.Xml.XmlReader)+0x90 ... 000000001AB7F0A0 000007FEE97ED792 mscorlib_ni!System.Threading.Tasks.Task.Execute()+0x82 000000001AB7F100 000007FEE90A181C mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0xdc 000000001AB7F160 000007FEE97E7F95 mscorlib_ni!System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)+0x1b5 000000001AB7F1E0 000007FEE97E7D90 mscorlib_ni!System.Threading.Tasks.Task.ExecuteEntry(Boolean)+0xb0 000000001AB7F220 000007FEE90EBA83 mscorlib_ni!System.Threading.ThreadPoolWorkQueue.Dispatch()+0x193 000000001AB7F2C0 000007FEE90EB8D5 mscorlib_ni!System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()+0x35 (TransitionUM)  EXCEPTION_OBJECT: !pe 2a61228 Exception object: 0000000002a61228 Exception type:   System.ExecutionEngineException Message:          <none> InnerException:   <none> StackTrace (generated): <none> StackTraceString: <none> HResult: 80131506  MANAGED_OBJECT_NAME:  System.ExecutionEngineException  MANAGED_STACK_COMMAND:  _EFN_StackTrace  LAST_CONTROL_TRANSFER:  from 000007feea12bce4 to 000007feea129a1d  ADDITIONAL_DEBUG_TEXT:  Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]  FAULTING_THREAD:  ffffffffffffffff  DEFAULT_BUCKET_ID:  INVALID_POINTER_READ_CALL  PRIMARY_PROBLEM_CLASS:  INVALID_POINTER_READ_CALL  BUGCHECK_STR:  APPLICATION_FAULT_INVALID_POINTER_READ_WRONG_SYMBOLS_CALL__SYSTEM.EXECUTIONENGINEEXCEPTION 

UPDATED AGAIN

Here is the WinDbg stack of the exception after I enabled paged heap:

 (1480.e84): Access violation - code c0000005 (first chance) ntdll!ZwTerminateProcess+0xa: 00000000`77c415da c3              ret 0:023> !clrstack OS Thread Id: 0xe84 (23) Child SP         IP               Call Site 0000000037ded848 0000000077c415da [HelperMethodFrame: 0000000037ded848] 0000000037dedab0 000007fee9effd17 System.Text.StringBuilder.ToString()*** WARNING: Unable to verify checksum for C:\Windows\assembly\NativeImages_v4.0.30319_64\mscorlib\8f7f691aa155c11216387cf3420d9d1b\mscorlib.ni.dll  0000000037dedb00 000007ff00cceae9 Sgml.Entity.ScanToken(System.Text.StringBuilder, System.String, Boolean)  0000000037dedb70 000007ff00cd19b2 Sgml.SgmlDtd.ParseAttDefault(Char, Sgml.AttDef) 0000000037dedbc0 000007ff00cd120b Sgml.SgmlDtd.ParseAttDef(Char) 0000000037dedc00 000007ff00cd1057 Sgml.SgmlDtd.ParseAttList(System.Collections.Generic.Dictionary`2<System.String,Sgml.AttDef>, Char) 0000000037dedc50 000007ff00cd10cd Sgml.SgmlDtd.ParseAttList(System.Collections.Generic.Dictionary`2<System.String,Sgml.AttDef>, Char) 0000000037dedca0 000007ff00cd0e9a Sgml.SgmlDtd.ParseAttList() 0000000037dedd10 000007ff00cce1f1 Sgml.SgmlDtd.Parse() 0000000037dedd60 000007ff00ccd945 Sgml.SgmlDtd.Parse(System.Uri, System.String, System.IO.TextReader, System.String, System.String, System.Xml.XmlNameTable) 0000000037dede00 000007ff00ccd582 Sgml.SgmlReader.LazyLoadDtd(System.Uri) 0000000037dede70 000007ff00ccd1f9 Sgml.SgmlReader.OpenInput() 0000000037deded0 000007ff00cccd8c Sgml.SgmlReader.Read() 0000000037dedf20 000007fee67b3bfc System.Xml.XmlLoader.Load(System.Xml.XmlDocument, System.Xml.XmlReader, Boolean)*** WARNING: Unable to verify checksum for C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Xml\8e4323f5bfb90be4621456033d8b404b\System.Xml.ni.dll *** ERROR: Module load completed but symbols could not be loaded for C:\Windows\assembly\NativeImages_v4.0.30319_64\System.Xml\8e4323f5bfb90be4621456033d8b404b\System.Xml.ni.dll  0000000037dedf80 000007fee67b36e0 System.Xml.XmlDocument.Load(System.Xml.XmlReader) [deleted] 0000000037deea90 000007feea61d432 System.Threading.Tasks.Task.Execute() 0000000037deeaf0 000007fee9ed17ec System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) 0000000037deeb50 000007feea617c35 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef) 0000000037deebd0 000007feea617a30 System.Threading.Tasks.Task.ExecuteEntry(Boolean) 0000000037deec10 000007fee9f1b953 System.Threading.ThreadPoolWorkQueue.Dispatch() 0000000037deecb0 000007fee9f1b7a5 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() 0000000037def310 000007feeae4dc54 [DebuggerU2MCatchHandlerFrame: 0000000037def310] 0:023> !verifyheap -verify will only produce output if there are errors in the heap The garbage collector data structures are not in a valid state for traversal. It is either in the "plan phase," where objects are being moved around, or we are at the initialization or shutdown of the gc heap. Commands related to displaying, finding or traversing objects as well as gc heap segments may not work properly. !dumpheap and !verifyheap may incorrectly complain of heap consistency errors. object 000000000e34caf8: bad member 000000001024b9a0 at 000000000e34cb08 curr_object:      000000000e34caf8 Last good object: 000000000e34cab0 ---------------- 0:023> !analyze Last event: 1480.e84: Exit process 0:1480, code 80131506   debugger time: Sun Sep 18 14:22:42.592 2011 (UTC + 1:00) 0:023> !analyze -v Last event: 1480.e84: Exit process 0:1480, code 80131506   debugger time: Sun Sep 18 14:22:42.592 2011 (UTC + 1:00) 0:023> .do e34cab0           ^ Syntax error in '.do e34cab0' 0:023> !do e34cab0 Name:        System.String MethodTable: 000007feea026870 EEClass:     000007fee9baed58 Size:        72(0x48) bytes File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll String:      appliedFiltersContainer Fields:               MT    Field   Offset                 Type VT     Attr            Value Name 000007feea02c758  4000103        8         System.Int32  1 instance               23 m_stringLength 000007feea02b298  4000104        c          System.Char  1 instance               61 m_firstChar 000007feea026870  4000105       10        System.String  0   shared           static Empty                                  >> Domain:Value  00000000021343a0:000000000db21420 << 0:023> !do e34caf8 <Note: this object has an invalid CLASS field> Name:        System.Reflection.RuntimeAssembly MethodTable: 000007feea02a128 EEClass:     000007fee9baf968 Size:        48(0x30) bytes File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll Fields:               MT    Field   Offset                 Type VT     Attr            Value Name 000007feea9ef7f0  4000e14        8 ...solveEventHandler  0 instance 0000000000000000 _ModuleResolve 000007feea036338  4000e15       10 ...che.InternalCache  0 instance 000000001024b9a0 m_cachedData 000007feea0259c8  4000e16       18        System.Object  0 instance 000000000e3abd18 m_syncRoot 000007feea033450  4000e17       20        System.IntPtr  1 instance         37a95f10 m_assembly 

What can it be?

like image 776
net_prog Avatar asked Aug 15 '11 12:08

net_prog


People also ask

How do I debug a heap corruption?

If the calling subprogram then uses its own COM pointer, the system will generate an access violation. To debug heap corruption, you must identify both the code that allocated the memory involved and the code that deleted, released, or overwrote it.

How do you find the source of heap corruption?

Check for heap corruptionTry using the Global Flags Utility (gflags.exe) or pageheap.exe. See /windows-hardware/drivers/debugger/gflags-and-pageheap.

How do I know if heap is corrupted?

Then you can sprinkle calls to CheckForHeapCorruption() throughout your code, so that when heap corruption occurs it will be detected at the next call to CheckForHeapCorruption() rather than some time later on.


2 Answers

Recently, I was faced with a managed heap corruption which was something new to me. I was very frustrated with it and had to learn many things to be able to debug it. I want to thank Seva Titov who gave me right direction to start. His answer was concise and very helpful. I want to log the actions I have taken to debug the problem for my own reference. Probably this will be helpful for others who are new to this.

Debug Heap Corruption in .NET 4:

How to suspect the heap corruption?

Briefly:

  1. The application crashes randomly with no regards to the applied exception catching and even goes through blankets like catch(Exception) which are supposed to catch all exceptions.

  2. Examining the CLR stack in the application crash dumps shows the garbage collector on the top of the stack:

    000000001dabd8c8 000007feea129a1d [**HelperMethodFrame**: 000000001dabd8c8] 000000001dabda00 000007fee90cfce8 System.Text.StringBuilder.ExpandByABlock(Int32) 000000001dabda40 000007fee90cfba4 System.Text.StringBuilder.Append(Char*, Int32) ...  EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 000007feea129a1d (**clr!WKS::gc_heap**::find_first_object+0x0000000000000092)    ExceptionCode: c0000005 (Access violation)   ExceptionFlags: 00000000 NumberParameters: 2    Parameter[0]: 0000000000000000    Parameter[1]: 0000000000003d80 ... 
  3. The CLR stack always shows different points. Whether the crash occurred or the code which is shown is clearly irrelevant, like StringBuilder's method which is shown to cause the exception.

For more details refer to .NET Crash: Managed Heap Corruption calling unmanaged code.

Going step by step. Each next step is used if the previous one doesn't help.

Step 1. Check the code.

Check the code for unsafe or native code usages:

  1. Review the code for unsafe, DllImport statements.
  2. Download .NET Reflector and use it to analyze the application assemblies for PInvoke. In the same way, analyze the third-party assemblies which are used by the application.

If unsafe or native code usage is found, direct extra attention to those. The most common cause of the heap corruption in such cases is a buffer overflow or an argument type mismatch. Ensure that the buffer supplied to the native code to fill is big enough and that all arguments passed to the native code are of the expected type.

Step 2. Check if this corrupted state exception can be caught.

To handle such exceptions, one need to decorate the method which contains the catch(Exception) statement with the [HandleProcessCorruptedStateExceptions] attribute or apply the following in the app.config file:

<configuration>     <runtime>         <legacyCorruptedStateExceptionsPolicy enabled="true" />     </runtime> </configuration> 

In the case the exception was caught successfully, you can log and examine it. This means this is not a corrupted heap issue.

Corrupted heap exceptions cannot be handled at all: HandleProcessCorruptedStateExceptions doesn't seem to work.

More information on corrupted state exceptions, see All about Corrupted State Exceptions in .NET4.

Step 3. Live debugging.

In this step, we debug the crashing application live in the production environment (or where we can reproduce the crash).

Download Debugging Tools for Windows from Microsoft Windows SDK for Windows 7 and .NET Framework 4 (a web installer will be downloaded which will allow selecting the required components to install - mark all components). It will install both 32 and 64 bit (if your system is x64) versions of the required debugging tools.

Here one needs to know how to attach WinDbg to a live process, how to take crash dumps and examine them, how to load SOS extension in WinDbg (google for details).

Enable debugging helpers:

  1. Launch Application Verifier (C:\Program Files\Application Verifier - use the required edition, either x86 or x64, depending on your executable compilation mode), add your executable there in the left pane and in the right pane check one node "Basics / Heaps". Save the changes.

  2. Launch Global Flags helper (C:\Program Files\Debugging Tools for Windows\gflags.exe - again select the correct edition, x86 or x64). Once Global Flags is started, go to the "Image File" tab and at the top text box enter the name of your executable file without any paths (for example, "MyProgram.exe"). Then press the Tab key and set the following boxes:

    • Enable heap tail checking
    • Enable heap free checking
    • Enable heap parameter checking
    • Enable heap validation on call
    • Disable heap coalesce on free
    • Enable page heap
    • Enable heap tagging
    • Enable application verifier
    • Debugger (type the path to the installed WinDbg in the text box to the right, for example, C:\Program Files\Debugging Tools for Windows (x64)\windbg.exe -g).

    For more details, refer to Heap Corruption, Part 2.

  3. Go to "Control Panel/System and Security/System" (or right-click "Computer" in the Start menu and select "Properties". There click "Advanced system settings", in the displayed dialog, go to "Advanced" tab and click the "Environment Variables" button. In the displayed dialog, add a new System variable (if you are an system administrator - a User variable otherwise - you need need to logout/login in this case). The required variable is "COMPLUS_HeapVerify" with a value of "1". More details can be found in Stack Overflow question .NET/C#: How to set debugging environment variable COMPLUS_HeapVerify?.

Now we are ready to start debugging. Start the application. WinDbg should start automatically for it. Leave the application running until it crashes into WinDgb and then examine the dump.

TIP: To quickly remove Global Flags, Application Verifier and the debugger attachment settings, delete the following key in the registry: x64 - HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\*YourAppName*

Step 4. Enable MDAs.

Try to use the Managed Debugging Assistants. Details are in Stack Overflow question What MDAs are useful to track a heap corruption?.

MDAs must be used along with WinDbg. I used them even along with Global Flags and Application Verifier.

Step 5. Enable GCStress.

Using GCStress is an extreme option, because the application becomes almost unusable, but it is still a way to go. More details are in GCStress: How to turn on in Windows 7?.

Step 6. Compile for x86.

If your application is currently being compiled for "Any CPU" or "x64" platform, try to compile it for "x86" if there is no difference for you which platform to use. I saw this reported to solve the problem for somebody.

Step 7. Disable concurrent GC - this is what worked for me

There is a reported known issue in .NET 4 reported in the thread Access Violation in .NET 4 Runtime in gc_heap::garbage_collect with no unmanaged modules. The problem can be solved by disabling the concurrent GC in the app.config file:

<?xml version="1.0"?> <configuration>     <runtime>         <gcConcurrent enabled="false" />     </runtime> </configuration> 
like image 160
net_prog Avatar answered Sep 27 '22 18:09

net_prog


You have managed heap corruption. It is not easy to find the root cause of the problem for managed heap corruption, because the problem usually demonstrates itself long after the heap is corrupted. In your case, the StringBuilder is a red herring. Corruption happened sometime before.

What I would do is the following:

  1. Check if you have any unsafe C# code. If you have any, double check the logic there.
  2. Enable paged heap for your application. Running it with paged heap will help uncover problems with unmanaged code -- in case unmanaged code is corrupting the managed heap.
  3. Run !VerifyHeap in different places. This way you might be able to localize the place in your code where corruption happens.
  4. If you have the server type of garbage collection enabled for your application, temporarily change that to workstation garbage collection -- you will get more predictable behavior this way.
  5. Read through Tesses' blog post .NET Crash: Managed Heap Corruption calling unmanaged code. It demonstrates some examples of managed heap corruption.

Note that when you will be running your code under WinDbg, you will come across occasional first chance AV. It is safe to ingore that, just type sxd av once you attach WinDbg to the process, and investigate only second chance AVs.

like image 33
seva titov Avatar answered Sep 27 '22 19:09

seva titov