Preamble: This question is specifically concerned with, and only with, the behavior of the dynamic CRT used through /MD
. It does not question the validity of any other recommendations wrt. DllMain
.
As we've been told: (ref: Dynamic-Link Library Best Practices, MSDN, May 17, 2006)
You should never perform the following tasks from within DllMain:
- ...
- Use the memory management function from the dynamic C Run-Time (CRT). If the CRT DLL is not initialized, calls to these functions can cause the process to crash.
- ...
Others have questioned this already (as in: questioned the validity of the argument) and since we helpfully get an answer there, we can clearly see one rather simple case where this could potentially cause troubles:
You are working from the assumption that the entrypoint for a DLL is always _DllMainCRTStartup. This is not the case, it is merely the linker's default. It can be anything a programmer wants it to be, swiftly and easily changed with the linker's /ENTRYPOINT option. There is nothing that Microsoft can do to prevent this.
So these are the elements of this question:
Is there any other situation when linking /MD
and not supplying a custom /ENTRYPOINT
, where the dynamic CRT ought to not be fully initialized?
LoadLibrary
calls at all, just link time DLL dependencies.Bonus: The MS docs specifically call out "memory management function", but as far as I can tell, if the CRT is not initialized, potentially any CRT function should be unsafe. Why call out memory management functions in this way?
No.3:
Wrt. to the custom ENTRYPOINT
: I don't quite see how this can be such an important scenario that it need be included in the not-do-in-DllMain list without further qualification. IFF I supply a custom entry point, I'm responsible for correctly initializing the CRT, or the CRT will not work properly anywhere in my program, not just DllMain. Why call out the DllMain part specifically?
This leads me back to Q.1, namely if this is the only scenario where this is problematic for the dynamic CRT. A clarification or eye-opener why this would be more important for DllMain that for other parts of the DLL, or what I might miss here, would be appreciated.
Bonus links:
Rationale: I feel I should add this for context: I am asking this because we have massive amounts of code doing things via global C++ object constructors. Things that actually broke have been vetted out over the years (like concurrent LoadLibrary
, thread sync, etc.), but all the code is full of std
C++ and CRT functions, that happily have been working for years on Windows XP, 7 and Windows 10 without any known hiccups. While I'm not one to cry "but it just works", I have to do an engineering judgment here on whether there is any short-to-medium value in trying to "fix" this. Therefore, I would appreciate if the soapbox answers could be left in their boxes.
Is there any other situation when linking
/MD
and not supplying a custom/ENTRYPOINT
, where the dynamic CRT ought to not be fully initialized?
first some notation:
X[ Y, Z]
X_DllMain
X_DllMain
call LoadLibrary(Y)
: X<Y>
when we use /MD
- we use crt in separate DLL(s). initialized in this context mean that entry point(s) of crt DLL(s) already called. so question can be more general and clear:
are from X[Y]
=> Y_DllMain
called before X_DllMain
?
in general case no. because can be circular dependency, when Y[X]
or Y[Z[X]]
.
most known example user32[gdi32]
, and gdi32[user32]
or in win10 depends on gdi32[gdi32full[user32]]
. so user32_DllMain
or gdi32_DllMain
must be called first ? however obvious that any crt DLL(s) not depends on our custom DLL. so let exclude circular dependency case.
when loader load module X - it load all it dependency modules (and it dependency - this is recursive process), if it already not in memory, then loader build call graph, and begin call modules entry points. obvious if A[B]
, loader always try call B_DllMain
before A_DllMain
(except circular dependency when order of calls is undefined). but which modules will be in call graph ? all X dependency modules ? of course no. some of this modules can already be in memory (loaded) when we begin load X. so it entry points already called, with DLL_PROCESS_ATTACH
and must not be called second time now. this strategy used in xp, vista, win7:
when we load X:
A[B]
- call B_DllMain
before A_DllMain
example: loaded X[Y[W[Z]], Z]
//++begin load X
Z_DllMain
W_DllMain
Y_DllMain
X_DllMain
// --end load X
but this scenario not take in account next case - some module can be already in memory, but it entry point yet not called. how this can happen ?
this can happen in case some module entry point call LoadLibrary
.
example - loaded X[Y<W[ Z]>, Z]
//++begin load X
Y_DllMain
//++begin load W
W_DllMain
//--end load W
Z_DllMain
X_DllMain
// --end load X
so W_DllMain
will be called before Z_DllMain
, despite W[Z]
. exactly because this not recommended call LoadLibrary
from DLL entry point.
but from Dynamic-Link Library Best Practices
This can cause a deadlock or a crash.
the words about deadlock not true - of course any deadlock can not be basically. where ? how ? we already hold loader lock inside DLL entry point and this lock can be acquired recursively. crash really can be (before win8).
or another false:
Call
ExitThread
. Exiting a thread during DLL detach can cause the loader lock to be acquired again, causing a deadlock or a crash.
but which is really will be - thread exit without free loader lock. it became busy forever. as result any new thread creation or exit, any new DLL load or unload, or just ExitProcess
call - hung, when try acquire loader lock. so deadlock here really will be, but not during Call ExitThread
- latter.
and of course interesting note - the windows itself call LoadLibrary
from DllMain
- user32.dll always call LoadLibrary
for imm32.dll from it entry point (still true and on win10)
but begin from win8 (or win8.1) loader became more smart on handle dependency modules. now 2 is changed
2. call entry points of new loaded (after X) modules or if module yet not initialized.
so in modern windows (8+) for load X[Y<W[Z]>, Z]
//++begin load X
Y_DllMain
//++begin load W
Z_DllMain
W_DllMain
//--end load W
X_DllMain
// -- end load X
the Z initialization will be moved to W load call graph. as result all will be correct now.
for test this we can build next solution: test.exe[ kernel32, D1< D2[kernel32, msvcrt] >, msvcrt ]
SomeFunc
LoadLibraryW(L"D2")
from it entry point, and then call D2.SomeFunc
(exactly in this order ! this is critical important - D1 must be before msvcrt in import, for this need set D1 before msvcrt in linker command line)
as result D1 entry point will be called before msvcrt. this is normal - D1 not depends on msvcrt but when D1 load D2 from it entry point, became interesting
code for D2.dll ( /NODEFAULTLIB kernel32.lib msvcrt.lib
)
#include <Windows.h>
extern "C"
{
__declspec(dllimport) int __cdecl sprintf(PSTR buf, PCSTR format, ...);
}
BOOLEAN WINAPI MyEp( HMODULE , DWORD ul_reason_for_call, PVOID )
{
if (ul_reason_for_call == DLL_PROCESS_ATTACH)
{
OutputDebugStringA("D2.DllMain\n");
}
return TRUE;
}
INT_PTR WINAPI SomeFunc()
{
__pragma(message(__FUNCDNAME__))
char buf[32];
// this is only for link to msvcrt.dll
sprintf(buf, "D2.SomeFunc\n");
OutputDebugStringA(buf);
return 0;
}
#ifdef _WIN64
#define FuncName "?SomeFunc@@YA_JXZ"
#else
#define FuncName "?SomeFunc@@YGHXZ"
#endif
__pragma(comment(linker, "/export:" FuncName ",@1,NONAME,PRIVATE"))
code for D1.dll ( /NODEFAULTLIB kernel32.lib
)
#include <Windows.h>
#pragma warning(disable : 4706)
BOOLEAN WINAPI MyEp( HMODULE hmod, DWORD ul_reason_for_call, PVOID )
{
if (ul_reason_for_call == DLL_PROCESS_ATTACH)
{
OutputDebugStringA("D1.DllMain\n");
if (hmod = LoadLibraryW(L"D2"))
{
if (FARPROC fp = GetProcAddress(hmod, (PCSTR)1))
{
fp();
}
}
}
return TRUE;
}
INT_PTR WINAPI SomeFunc()
{
__pragma(message(__FUNCDNAME__))
OutputDebugStringA("D1.SomeFunc\n");
return 0;
}
#ifdef _WIN64
#define FuncName "?SomeFunc@@YA_JXZ"
#else
#define FuncName "?SomeFunc@@YGHXZ"
#endif
__pragma(comment(linker, "/export:" FuncName ",@1,NONAME"))
code for exe ( /NODEFAULTLIB kernel32.lib D1.lib msvcrt.lib
)
#include <Windows.h>
extern "C"
{
__declspec(dllimport) int __cdecl sprintf(PSTR buf, PCSTR format, ...);
}
__declspec(dllimport) INT_PTR WINAPI SomeFunc();
void ep()
{
char buf[32];
// this is only for link to msvcrt.dll
sprintf(buf, "exe entry\n");
OutputDebugStringA(buf);
ExitProcess((UINT)SomeFunc());
}
output for xp:
LDR: D1.dll loaded - Calling init routine
D1.DllMain
Load: D2.dll
LDR: D2.dll loaded - Calling init routine
D2.DllMain
D2.SomeFunc
LDR: msvcrt.dll loaded - Calling init routine
exe entry
D1.SomeFunc
for win7:
LdrpRunInitializeRoutines - INFO: Calling init routine for DLL "D1.dll"
D1.DllMain
Load: D2.dll
LdrpRunInitializeRoutines - INFO: Calling init routine for DLL "D2.DLL"
D2.DllMain
D2.SomeFunc
LdrpRunInitializeRoutines - "msvcrt.dll"
exe entry
D1.SomeFunc
in both case call flow is the same - D2.DllMain
called before msvcrt entry point, despite D2[msvcrt]
but on win8.1 and win10 - call flow is another:
LdrpInitializeNode - INFO: Calling init routine for DLL "D1.dll"
D1.DllMain
LdrpInitializeNode - INFO: Calling init routine for DLL "msvcrt.dll"
LdrpInitializeNode - INFO: Calling init routine for DLL "D2.DLL"
D2.DllMain
D2.SomeFunc
exe entry
D1.SomeFunc
the D2 entry point called after msvcrt initialization.
so what is conclusion?
if when module X[Y]
is loaded and no not initialized Y in memory - Y_DllMain
will be called before X_DllMain
. or in another words - if nobody call LoadLibrary(X)
(or LoadLibrary(Z[X])
) from DLL entry point. so if your DLL will be loaded "normal" way (not by call LoadLibrary
from DllMain
or injected from driver on some dll load event) - you can be sure that crt entry point already called (crt initialized)
more - if you run on win8.1+ - and X[Y]
is loaded - Y_DllMain
will be always called before X_DllMain
.
now about custom /ENTRYPOINT
in your dll.
even if you use crt in separate DLLs - some small crt code will be statically linked to your module DllMainCRTStartup
- which call your function DllMain
(this is not a entry point) by name. so in case dynamic crt - we really have 2 crt parts - main part in separate DLLs and it will be initialized before your DLL entry point is called (if not special case which i describe higher and win7,vista,xp). and small static part (code inside your module). when this static part will be called already full depend from you. this part DllMainCRTStartup
do some internal initializations, initialize global objects in your code (initterm
) and call DllMain
, after it return (on dll detach) call destructors for globals..
if you set custom entry point in DLL - at this point crt in separate DLLs already initialized, but your static crt no (as and global objects). from this custom entry point you will be need call DllMainCRTStartup
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With