I have a large application that recently started exhibiting rather strange behavior when running in a debugger. First, the basics:
OS: Windows 7 64-bit.
Application: Multithreaded VCL app with many dlls, bpls, and other components.
Compiler/IDE: Embarcadero RAD Studio 2010.
The observed symptom is this: While the debugger is attached to my application, certain tasks cause the application to crash. The particulars are furthermore perplexing: My application stops with a Windows message saying, "YourApplication has stopped working." And it helpfully offers to send a minidump to Microsoft.
It should be noted: the application doesn't crash when the debugger is not attached. Also, the debugger doesn't indicate any exceptions or other issues while the application is running.
Setting and stepping through breakpoints seems to affect the point at which the application crashes, but I suspect that is a symptom of debugging a thread other than the problematic one.
These crashes also occur on the computers on my colleagues, with the same behavior I observe. This leads me to not suspect a failed installation of something on my computer particularly. My colleagues experiencing the issue are also running Windows 7 64-bit. I have no colleagues not experiencing the issue.
I've collected an analyzed a number of full dumps from the crashes. I discovered that the failure was actually happening in the same place each time. Here is the exception data from the dumps (it is always the same, except of course the ThreadId):
Exception Information
ThreadId: 0x000014C0
Code: 0x4000001F Unknown (4000001F)
Address: 0x773F2507
Flags: 0x00000000
NumberParameters: 0x00000001
0x00000000
Google reveals that Code 0x4000001F is actually STATUS_WX86_BREAKPOINT. Microsoft unhelpfully describes it as "An exception status code that is used by the Win32 x86 emulation subsystem."
Here are the stack details (which don't seem to vary):
0x773F2507: ntdll.dll+0x000A2507: RtlQueryCriticalSectionOwner + 0x000000E8
0x773F3DAB: ntdll.dll+0x000A3DAB: RtlQueryProcessLockInformation + 0x0000020D
0x773D2ED9: ntdll.dll+0x00082ED9: RtlUlonglongByteSwap + 0x00005C69
0x773F3553: ntdll.dll+0x000A3553: RtlpQueryProcessDebugInformationRemote + 0x00000044
0x74F73677: kernel32.dll+0x00013677: BaseThreadInitThunk + 0x00000012
0x77389F02: ntdll.dll+0x00039F02: RtlInitializeExceptionChain + 0x00000063
0x77389ED5: ntdll.dll+0x00039ED5: RtlInitializeExceptionChain + 0x00000036
It is worth noting that there appears to be a function epilog at 0x773F24ED, which rather suggests that the RtlQueryCriticalSectionOwner is a red herring. Likewise, a function epilog casts doubt on RtlQueryProcessLockInformation. The 0x5C69 offset casts doubt on the RtlUlonglongByteSwap. The other symbols look legit, though.
Specifically, RtlpQueryProcessDebugInformationRemote looks legitimate. Some people on the internet (http://www.cygwin.com/ml/cygwin-talk/2006-q2/msg00050.html) seem to think that it is created by the debugger to collect debug information. That theory seems sound to me, since it only seems to appear when the debugger is attached.
As always, when something breaks, something changed that broke it. In this case, that something is dynamically loading a new dll. I can cause the crash to stop happening by not dynamically loading the particular dll. I'm not convinced that the dll loading is related, but here are the details, just in case:
The dll source is C. Here are the compile options that are not set to the default:
Language Compliance: ANSI
Merge duplicate strings: True
Read-only strings: True
PCH usage: Do not use
Dynamic RTL: False
(The Project Options say False is default for Dynamic RTL, though it was set to True when I created the dll project.)
The dll is loaded with LoadLibrary and freed with FreeLibrary. All seems to be fine with the loading and unloading of the module. However, shortly after the library is unloaded (with FreeLibrary), the aforementioned thread crashes the program. For debugging, I removed all actual calls to the library (including, for more testing, DllMain). No combination of calls or not calls, DllMain or no DllMain, or anything else seemed to alter the behavior of the crash in any way. Simply loading and unloading the dll invokes the crash later on.
Furthermore, changing the dll to use the Dynamic RTL also causes the debugger thread crash to cease. This is undesirable because the compiled dll really should be usable without CodeGear Runtime available. Also, dll size is important. The C code contained in the dll does not make use of any libraries. (It includes no headers, even standard library headers. No malloc/free, no printf, no nothin'. It contains only functions that depend exclusively on their inputs and do not require dynamic allocation.) It is also undesirable because "fixing" a bug by changing stuff until it works without understanding why it works is really never a good plan. (It tends to lead to bug recurrences and strange coding practices. But really, at this point, if I can't find anything else, I may admit defeat on this count.)
And finally, my issue may be related to one of these issues:
Any ideas or suggestions would be appreciated.
I solved the above mentioned problem by using a modified version of the PatchINT3 workaround, which was published in 2007 for BDS 2006:
procedure PatchINT3;
const
INT3: Byte = $CC;
NOP: Byte = $90;
var
NTDLL: THandle;
BytesWritten: DWORD;
Address: PByte;
begin
if Win32Platform <> VER_PLATFORM_WIN32_NT then
Exit;
NTDLL := GetModuleHandle('NTDLL.DLL');
if NTDLL = 0 then
Exit;
Address := GetProcAddress(NTDLL, 'RtlQueryCriticalSectionOwner');
if Address = nil then
Exit;
Inc(Address, $E8);
try
if Address^ <> INT3 then
Exit;
if WriteProcessMemory(GetCurrentProcess, Address, @NOP, 1, BytesWritten)
and (BytesWritten = 1) then
FlushInstructionCache(GetCurrentProcess, Address, 1);
except
//Do not panic if you see an EAccessViolation here, it is perfectly harmless!
on EAccessViolation do
;
else
raise;
end;
end;
Call this routine once after you have loaded the DLL in your thread. The patch fixes a user breakpoint in ntdll.dll version 6.1.7601.17725 and changes it to a NOP.
If there is no user breakpoint (INT3 (=$CC) opcode) at the expected address, the patch routine does nothing and exits.
Hope that helps,
Andreas
Footnote
The original source of PatchINT3 can be found here:
http://coding.derkeiler.com/Archive/Delphi/borland.public.delphi.non-technical/2007-01/msg04431.html
Footnote2
The same function in C++:
void PatchINT3()
{
unsigned char INT3 = 0xCC;
unsigned char NOP = 0x90;
if (Win32Platform != VER_PLATFORM_WIN32_NT)
{
return;
}
HMODULE ntdll = GetModuleHandle(L"NTDLL.DLL");
if (ntdll == NULL)
{
return;
}
unsigned char *address = (unsigned char*)GetProcAddress(ntdll,
"RtlQueryCriticalSectionOwner");
if (address == NULL)
{
return;
}
address += 0xE8;
try
{
if (*address != INT3)
{
return;
}
unsigned long bytes_written = 0;
if (WriteProcessMemory(GetCurrentProcess(), address, &NOP, 1,
&bytes_written) && (bytes_written == 1))
{
FlushInstructionCache(GetCurrentProcess, address, 1);
}
}
catch (EAccessViolation &e)
{
//Do not panic if you see an EAccessViolation
//here, it is perfectly harmless!
}
catch(...)
{
throw;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With