Just from the start: Since March 1st 2017 this is a bug confirmed by Microsoft. Read comments at the end.
Short description:
I have random crashes in larger application using MFC, ATL. In all such cases after ATL subclassing was used for a window upon simple actions with a window (moving, resizing, setting the focus, painting etc.) I get a crash on a random execution address.
First it looked like a wild pointer or heap corruption but I narrowed the complete scenario down to a very simple application using pure ATL and only Windows API.
Requirements / my used scenarios:
What the application does:
It simply creates a frame window and tries to create many static windows with the windows API. After the static window is created, this window is subclassed with the ATL CWindowImpl::SubclassWindow method. After the subclass operation a simple window message is sent.
What happens:
Not on every run, but very often the application crashes upon SendMessage to the subclassed window. On the 257 window ( or another multiple of 256+1) the subclass fails in some way. The ATL thunk that is created is invalid. It seems that the stored execution address of the new subclass-function isn't correct. Sending any the message to the window causes a crash. The callstack is always the same. The last visible and known address in the callstack is in the atlthunk.dll
atlthunk.dll!AtlThunk_Call(unsigned int,unsigned int,unsigned int,long) Unknown
atlthunk.dll!AtlThunk_0x00(struct HWND__ *,unsigned int,unsigned int,long) Unknown
user32.dll!__InternalCallWinProc@20() Unknown
user32.dll!UserCallWinProcCheckWow() Unknown
user32.dll!SendMessageWorker() Unknown
user32.dll!SendMessageW() Unknown
CrashAtlThunk.exe!WindowCheck() Line 52 C++
The thrown exception in the debugger is shown as:
Exception thrown at 0x0BF67000 in CrashAtlThunk.exe:
0xC0000005: Access violation executing location 0x0BF67000.
or another sample
Exception thrown at 0x2D75E06D in CrashAtlThunk.exe:
0xC0000005: Access violation executing location 0x2D75E06D.
What I know about atlthunk.dll:
Atlthunk.dll seems to be only part of 64bit OS. I found it on a Win 8.1 and Win 10 systems.
If atlthunk.dll is available (all Windows 10 machines), this DLL cares about the thunking. If the DLL isn't present, thunking is done in the standard way: allocating a block on the heap, marking it as executable, adding some load and a jump statement.
If the DLL is present. It contains 256 predefined slots for subclassing. If 256 subclasses are done, the DLL reloads itself a second time into memory and uses the next 256 available slots in the DLL.
As far as I see, the atlthunk.dll belongs to the Windows 10 and isn't exchangeable or redistributable.
Things checked:
Reproducibility:
The problem is somehow reproducible. It doesn't crashes all the time, it crashes randomly. I have a machine were the code crashes on every third execution.
I can repro it on two desktop stations with i7-4770 and a i7-6700.
Other machines seem not to be affected at all (works always on a Laptop i3-3217, or desktop with i7-870)
About the sample:
For simplicity I use a SEH handler to catch the error. If you debug the application the debugger will show the callstack mentioned above. The program can be launched with an integer on the command line.In this case the program launches itself again with the count decremented by 1.So if you launch CrashAtlThunk 100 it will launch the application 100 times. Upon an error the SEH handler will catch the error and shows the text "Crash" in a message box. If the application runs without errors, the application shows "Succeeded" in a message box. If the application is started without a parameter it is just executed once.
Questions:
Notes:
2017-01-20 Support case at Microsoft opened.
The code
// CrashAtlThunk.cpp : Defines the entry point for the application.
//
// Windows Header Files:
#include <windows.h>
// C RunTime Header Files
#include <stdlib.h>
#include <malloc.h>
#include <memory.h>
#include <tchar.h>
#define _ATL_CSTRING_EXPLICIT_CONSTRUCTORS // some CString constructors will be explicit
#include <atlbase.h>
#include <atlstr.h>
#include <atlwin.h>
// Global Variables:
HINSTANCE hInst; // current instance
const int NUM_WINDOWS = 1000;
//------------------------------------------------------
// The problematic code
// After the 256th subclass the application randomly crashes.
class CMyWindow : public CWindowImpl<CMyWindow>
{
public:
virtual BOOL ProcessWindowMessage(_In_ HWND hWnd, _In_ UINT uMsg, _In_ WPARAM wParam, _In_ LPARAM lParam, _Inout_ LRESULT& lResult, _In_ DWORD dwMsgMapID) override
{
return FALSE;
}
};
void WindowCheck()
{
HWND ahwnd[NUM_WINDOWS];
CMyWindow subclass[_countof(ahwnd)];
HWND hwndFrame;
ATLVERIFY(hwndFrame = ::CreateWindow(_T("Static"), _T("Frame"), SS_SIMPLE, 0, 0, 10, 10, NULL, NULL, hInst, NULL));
for (int i = 0; i<_countof(ahwnd); ++i)
{
ATLVERIFY(ahwnd[i] = ::CreateWindow(_T("Static"), _T("DummyWindow"), SS_SIMPLE|WS_CHILD, 0, 0, 10, 10, hwndFrame, NULL, hInst, NULL));
if (ahwnd[i])
{
subclass[i].SubclassWindow(ahwnd[i]);
ATLVERIFY(SendMessage(ahwnd[i], WM_GETTEXTLENGTH, 0, 0)!=0);
}
}
for (int i = 0; i<_countof(ahwnd); ++i)
{
if (ahwnd[i])
::DestroyWindow(ahwnd[i]);
}
::DestroyWindow(hwndFrame);
}
//------------------------------------------------------
int APIENTRY wWinMain(_In_ HINSTANCE hInstance,
_In_opt_ HINSTANCE hPrevInstance,
_In_ LPWSTR lpCmdLine,
_In_ int nCmdShow)
{
hInst = hInstance;
int iCount = _tcstol(lpCmdLine, nullptr, 10);
__try
{
WindowCheck();
if (iCount==0)
{
::MessageBox(NULL, _T("Succeeded"), _T("CrashAtlThunk"), MB_OK|MB_ICONINFORMATION);
}
else
{
TCHAR szFileName[_MAX_PATH];
TCHAR szCount[16];
_itot_s(--iCount, szCount, 10);
::GetModuleFileName(NULL, szFileName, _countof(szFileName));
::ShellExecute(NULL, _T("open"), szFileName, szCount, nullptr, SW_SHOW);
}
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
::MessageBox(NULL, _T("Crash"), _T("CrashAtlThunk"), MB_OK|MB_ICONWARNING);
return FALSE;
}
return 0;
}
Comment after answered by Eugene (Feb. 24th 2017):
I don't want to change my original question, but I want to add some additional information how to get this into a 100% Repro.
1, Change the main function to
int APIENTRY wWinMain(_In_ HINSTANCE hInstance,
_In_opt_ HINSTANCE hPrevInstance,
_In_ LPWSTR lpCmdLine,
_In_ int nCmdShow)
{
// Get the load address of ATLTHUNK.DLL
// HMODULE hMod = LoadLibrary(_T("atlThunk.dll"));
// Now allocate a page at the prefered start address
void* pMem = VirtualAlloc(reinterpret_cast<void*>(0x0f370000), 0x10000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
DWORD dwLastError = ::GetLastError();
hInst = hInstance;
WindowCheck();
return 0;
}
Uncomment the LoadLibrary call. Compile.
Run the programm once and stop in the debugger. Note the address where the library was loaded (hMod).
Stop the program. Now comment the Library call again and change the VirtualAlloc
call to the address of the previous hMod value, this is the prefered load address in this window session.
Recompile and run. CRASH!
Thanks to eugene.
Up to now. Microsoft ist still investigating about this. They have dumps and all code. But I don't have a final answer. Fact is we have a fatal bug in some Windows 64bit OS.
I currently made the following changes to get around this
Open atlstdthunk.h of VS-2015.
Uncomment the #ifdef block completely that defines USE_ATL_THUNK2. Code lines 25 to 27.
Recompile your program.
This enables the old thunking mechanism well known from VC-2010, VC-2013... and this works crash free for me. As long as there are no other already compiled libraries involved that may subclass or use 256 windows via ATL in any way.
Comment (Mar. 1st 2017):
In fact this says. As long as there is no stable patch I can never use the normal ATL thunking again, because I will never know what Window versions out in the world will use my program. Because Windows 8 and Windows 8.1 and Windows 10 prior to RS2 will suffer on this bug.
Final Comment (Mar. 9th 2017):
My advice for all programers: Change the the atlstdthunk.h in your Visual Studio version VS-2015, VS-2017 (see above). I don't understand Microsoft. This bug is a serious problem in the ATL thunking. It may hit every programmer that uses a greater number of windows and/or subclassing.
We only know of a fix in Windows 10 RS2. So all older OS are affected! So I recommend to disable the use of the atlthunk.dll by commenting out the define noted above.
This is the bug inside atlthunk.dll. When it loads itself second time and further this happens manually via MapViewOfFile call. In this case not every address relative to the module base is properly changed (when DLL loaded by LoadLibarary/LoadLibraryEx calls system loader does this automatically). Then if the first time DLL was loaded on preferred base address everything works fine as unchanged addresses point to the similar code or data. But if not you got crash when 257th subclassed window handles messages.
Since Vista we have "address space layout randomization" feature this explains why your code crashes randomly. To have crash every time you have to discover atlthunk.dll base address on your OS (it differs on different OS versions) and do one memory page address space reservation at this address using VirtualAlloc call before the first subclass. To find the base address you can use dumpbin /headers atlthunk.dll
command or parse PE headers manually.
My test shows that on Windows 10 build 14393.693 x32 version is affected but x64 is not. On Server 2012R2 with latest updates both (x32 and x64) versions are affected.
BTW, atlthunk.dll code has around 10 times more CPU instructions per thunk call as previous implementation. It may be not very significant but it slows down the message processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With