Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to debug access violation 0xC0000005 in CorExitProcess on exit?

Our application (written in C++, VS 2010 project) has been running fine on all operating systems prior to Windows 8 (and still does). On Windows 8, however, when orderly exiting the application, an access violation occurs:

mfc100.dll!_DllMain@12()    <<< Crash here
mfc100.dll!__CRT_INIT@12()  
mfc100.dll!__DllMainCRTStartup@12() 
ntdll.dll!_LdrxCallInitRoutine@16() 
ntdll.dll!LdrpCallInitRoutine() 
ntdll.dll!LdrShutdownProcess()  
ntdll.dll!RtlExitUserProcess()  
kernel32.dll!_ExitProcessImplementation@4() 
mscoreei.dll!RuntimeDesc::ShutdownAllActiveRuntimes(unsigned int,class RuntimeDesc *,enum RuntimeDesc::ShutdownCompatMode)  
mscoreei.dll!_CorExitProcess@4()    
mscoree.dll!_ShellShim_CorExitProcess@4()   
msvcr100d.dll!__crtCorExitProcess(int status) line693   C
msvcr100d.dll!__crtExitProcess(int status) line 699 C
msvcr100d.dll!doexit(int code, int quick, int retcaller) line 621   C
msvcr100d.dll!exit(int code) Zeile 393  C
my.exe!__tmainCRTStartup() Zeile 568    C
my.exe!WinMainCRTStartup() Zeile 371    C
kernel32.dll!@BaseThreadInitThunk@12()  
ntdll.dll!__RtlUserThreadStart()    
ntdll.dll!__RtlUserThreadStart@8()  

In an MSDN forum topic it has been suggested to run GC.Collect() before exit, but I couldn't make any difference with such a call shortly before exit.

I am a bit at a loss about how I should debug the problem. As far as I understand, CorExitProcess takes care of cleaning up the managed resources of the application. So could this be a fault in a managed component?
Or is it more likely that some function pointer in _DllMain has been overwritten/corrupted? If so, how would I set a data breakpoint at the address in question? There is a post explaning how to debug a similar issue, but he's having the issue in his own DLL so he can actually peak at the exact source of the problem which I can't.

Any suggestions?

Edit: Additional information, windbg !analyze -v:

FAULTING_IP: 
mfc100+258e6c
64298e6c 8b4654          mov     eax,dword ptr [esi+54h]

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 64298e6c (mfc100+0x00258e6c)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 53f21f0c
Attempt to read from address 53f21f0c

CONTEXT:  00000000 -- (.cxr 0x0;r)
eax=53f21eb8 ebx=00000000 ecx=64187d2d edx=7fcde000 esi=53f21eb8 edi=00000001
eip=64298e6c esp=00c3f1b8 ebp=00c3f2ec iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210206
mfc100+0x258e6c:
64298e6c 8b4654          mov     eax,dword ptr [esi+54h] ds:0023:53f21f0c=????????

FAULTING_THREAD:  00000520

DEFAULT_BUCKET_ID:  WRONG_SYMBOLS

PROCESS_NAME:  ww.exe

ADDITIONAL_DEBUG_TEXT:  
You can run '.symfix; .reload' to try to fix the symbol path and load symbols.

MODULE_NAME: mfc100

FAULTING_MODULE: 77bc0000 ntdll

DEBUG_FLR_IMAGE_TIMESTAMP:  4d5f29b8

ERROR_CODE: (NTSTATUS) 0xc0000005 - Die Anweisung in 0x%08lx verweist auf Speicher 0x%08lx. Der Vorgang %s konnte nicht im Speicher durchgef hrt werden.

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - Die Anweisung in 0x%08lx verweist auf Speicher 0x%08lx. Der Vorgang %s konnte nicht im Speicher durchgef hrt werden.

EXCEPTION_PARAMETER1:  00000000

EXCEPTION_PARAMETER2:  53f21f0c

READ_ADDRESS:  53f21f0c 

FOLLOWUP_IP: 
mfc100+258e6c
64298e6c 8b4654          mov     eax,dword ptr [esi+54h]

APP:  ww.exe

ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) x86fre

MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x520 (0)
Current frame: 
ChildEBP RetAddr  Caller, Callee

PRIMARY_PROBLEM_CLASS:  WRONG_SYMBOLS

BUGCHECK_STR:  APPLICATION_FAULT_WRONG_SYMBOLS

LAST_CONTROL_TRANSFER:  from 6429da08 to 64298e6c

STACK_TEXT:  
WARNING: Stack unwind information not available. Following frames may be wrong.
00c3f2ec 6429da08 64040000 00000000 00000001 mfc100+0x258e6c
00c3f330 6429dac7 64040000 00c3f35c 77be077a mfc100+0x25da08
00c3f33c 77be077a 64040000 00000000 00000001 mfc100+0x25dac7
00c3f35c 77be07f0 6429daa9 64040000 00000000 ntdll!RtlAddMandatoryAce+0x14e
00c3f3a4 77bfa529 6429daa9 64040000 00000000 ntdll!RtlAddMandatoryAce+0x1c4
00c3f49c 77bfa40e 00000000 00000000 6f2d4890 ntdll!RtlExitUserProcess+0x1e7
00c3f4b0 76ff4231 00000000 77e8f3b0 ffffffff ntdll!RtlExitUserProcess+0xcc
00c3f4c4 6f8b3712 00000000 bd3cbe8b 01f1c054 KERNEL32!ExitProcess+0x15
00c3f74c 6f8c19a2 00000001 00c3f76c 6f1686ad mscoreei!GetFileVersion+0x1835
00c3f758 6f1686ad 00000000 77bdab85 6f8a0000 mscoreei!CorExitProcess+0x27
00c3f76c 70737954 00000000 00c3f784 7073798d mscoree!CorExitProcess+0x94
00c3f778 7073798d 00000000 00c3f7c8 70737ab0 MSVCR100!_query_new_mode+0x159
00c3f784 70737ab0 00000000 a2b843a9 00375f5c MSVCR100!_query_new_mode+0x192
00c3f7c8 70737b1d 00000000 00000000 00000000 MSVCR100!_query_new_mode+0x2b5
00c3f7dc 003274ab 00000000 d1ef1931 00000000 MSVCR100!exit+0x11
00c3f864 76ff173e 7fcdf000 00c3f8b4 77c16911 ww!_enc$textbss$begin+0x64ab
00c3f870 77c16911 7fcdf000 a613e810 00000000 KERNEL32!BaseThreadInitThunk+0x12
00c3f8b4 77c168bd ffffffff 77c8560a 00000000 ntdll!LdrInitializeThunk+0x1f0
00c3f8c4 00000000 003275da 7fcdf000 00000000 ntdll!LdrInitializeThunk+0x19c


STACK_COMMAND:  .cxr 0x0 ; kb

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  mfc100+258e6c

FOLLOWUP_NAME:  MachineOwner

IMAGE_NAME:  mfc100.dll

BUCKET_ID:  WRONG_SYMBOLS

FAILURE_BUCKET_ID:  WRONG_SYMBOLS_c0000005_mfc100.dll!Unknown

ANALYSIS_SOURCE:  UM

FAILURE_ID_HASH_STRING:  um:wrong_symbols_c0000005_mfc100.dll!unknown

FAILURE_ID_HASH:  {9e516b68-081f-78d6-cf23-b42f2b3cb573}

Followup: MachineOwner
---------

Screenshot of there the crash occurs: Source code

like image 539
floele Avatar asked Apr 10 '14 07:04

floele


2 Answers

As discussed in comments, our similar problem was where we had a native C++ application that communicated with a managed C# application running as a COM server. To allow the managed component to communicate events to the C++ app, an event sink was exposed as a simple ATL COM interface from the native side, which on the .NET side was automatically encapsulated in a Runtime Callable Wrapper.

The access violation on application close - which wasn't always visible except in the event logs - was due to the fact that the RCW didn't call Release() on our ATL COM interfaces until it was garbage collected. As this happened when the .NET runtime closed, which was after the native runtime had shut down, it tried to callback into dead code.

The solution for us was to expose a "shutdown" method on the .NET side that disposed of all the communicating objects, then called:

GC.Collect();
GC.WaitForPendingFinalizers();

Ok, I understand that this might not exactly mirror your problem, but the route in to finding out what was causing it was to use the Managed Debugging Assistants, particularly reportAvOnCOMRelease.

We activated the MDA by registry keys and ran the native app via a debugger to see the additional output that identified the COM interfaces that were being held too long. Probably as a first step, it would be wise to activate all of the MDA options to glean as much info as possible from the crash.

like image 166
Roger Rowland Avatar answered Sep 28 '22 18:09

Roger Rowland


I tried debugging this using data breakpoints, but that didn't help a lot. I could see that at some point the data being accessed was overwritten, but that didn't happen in a call stack containing any of my own code.

So I resorted in a simpler method and started removing parts of the program until the error disappeared. In a large application it may be hard to remove some parts without breaking others, but I was able to narrow down the source of the issue.

Basically, the problem stopped occurring after removing a certain call to FreeLibrary. After further investigation it turned out that this call happens during DllMain, which is not allowed:

The entry-point function should perform only simple initialization or termination tasks. It must not call the LoadLibrary or LoadLibraryEx function (or a function that calls these functions), because this may create dependency loops in the DLL load order. This can result in a DLL being used before the system has executed its initialization code. Similarly, the entry-point function must not call the FreeLibrary function (or a function that calls FreeLibrary) during process termination, because this can result in a DLL being used after the system has executed its termination code.

In another SO question, one user apparently noticed a change since Windows 8 in this regard, which would explain why the error only happens on this version of Windows.

We'll now change our application so that FreeLibrary is called at a different point of time.

like image 27
floele Avatar answered Sep 28 '22 20:09

floele