Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In which cases is the dynamic CRT not already initialized on call to user supplied DllMain?

Preamble: This question is specifically concerned with, and only with, the behavior of the dynamic CRT used through /MD. It does not question the validity of any other recommendations wrt. DllMain.


As we've been told: (ref: Dynamic-Link Library Best Practices, MSDN, May 17, 2006)

You should never perform the following tasks from within DllMain:

  • ...
  • Use the memory management function from the dynamic C Run-Time (CRT). If the CRT DLL is not initialized, calls to these functions can cause the process to crash.
  • ...

Others have questioned this already (as in: questioned the validity of the argument) and since we helpfully get an answer there, we can clearly see one rather simple case where this could potentially cause troubles:

You are working from the assumption that the entrypoint for a DLL is always _DllMainCRTStartup. This is not the case, it is merely the linker's default. It can be anything a programmer wants it to be, swiftly and easily changed with the linker's /ENTRYPOINT option. There is nothing that Microsoft can do to prevent this.

So these are the elements of this question:

  • Is there any other situation when linking /MD and not supplying a custom /ENTRYPOINT, where the dynamic CRT ought to not be fully initialized?

    • Specifically, if all DLL loading only done through "static dependencies", i.e. no explicit LoadLibrarycalls at all, just link time DLL dependencies.
  • Bonus: The MS docs specifically call out "memory management function", but as far as I can tell, if the CRT is not initialized, potentially any CRT function should be unsafe. Why call out memory management functions in this way?

  • No.3:

    Wrt. to the custom ENTRYPOINT: I don't quite see how this can be such an important scenario that it need be included in the not-do-in-DllMain list without further qualification. IFF I supply a custom entry point, I'm responsible for correctly initializing the CRT, or the CRT will not work properly anywhere in my program, not just DllMain. Why call out the DllMain part specifically?

    This leads me back to Q.1, namely if this is the only scenario where this is problematic for the dynamic CRT. A clarification or eye-opener why this would be more important for DllMain that for other parts of the DLL, or what I might miss here, would be appreciated.


Bonus links:

  • When are global objects constructed and destructed by Visual C++?
  • DllMain : a horror story
  • Calling LoadLibrary from DllMain

Rationale: I feel I should add this for context: I am asking this because we have massive amounts of code doing things via global C++ object constructors. Things that actually broke have been vetted out over the years (like concurrent LoadLibrary, thread sync, etc.), but all the code is full of std C++ and CRT functions, that happily have been working for years on Windows XP, 7 and Windows 10 without any known hiccups. While I'm not one to cry "but it just works", I have to do an engineering judgment here on whether there is any short-to-medium value in trying to "fix" this. Therefore, I would appreciate if the soapbox answers could be left in their boxes.

like image 910
Martin Ba Avatar asked Oct 28 '22 20:10

Martin Ba


1 Answers

Is there any other situation when linking /MD and not supplying a custom /ENTRYPOINT, where the dynamic CRT ought to not be fully initialized?

first some notation:

  • X have static import (depends on) Y and Z : X[ Y, Z]
  • X entry point : X_DllMain
  • X_DllMain call LoadLibrary(Y) : X<Y>

when we use /MD - we use crt in separate DLL(s). initialized in this context mean that entry point(s) of crt DLL(s) already called. so question can be more general and clear:

are from X[Y] => Y_DllMain called before X_DllMain ?

in general case no. because can be circular dependency, when Y[X] or Y[Z[X]].

most known example user32[gdi32], and gdi32[user32] or in win10 depends on gdi32[gdi32full[user32]] . so user32_DllMain or gdi32_DllMain must be called first ? however obvious that any crt DLL(s) not depends on our custom DLL. so let exclude circular dependency case.

when loader load module X - it load all it dependency modules (and it dependency - this is recursive process), if it already not in memory, then loader build call graph, and begin call modules entry points. obvious if A[B], loader always try call B_DllMain before A_DllMain (except circular dependency when order of calls is undefined). but which modules will be in call graph ? all X dependency modules ? of course no. some of this modules can already be in memory (loaded) when we begin load X. so it entry points already called, with DLL_PROCESS_ATTACH and must not be called second time now. this strategy used in xp, vista, win7:

when we load X:

  1. load or locate in memory all it dependency modules
  2. call entry points of new loaded (after X) modules only.
  3. if A[B] - call B_DllMain before A_DllMain

example: loaded X[Y[W[Z]], Z]

//++begin load X
Z_DllMain
W_DllMain
Y_DllMain
X_DllMain
// --end load X

but this scenario not take in account next case - some module can be already in memory, but it entry point yet not called. how this can happen ? this can happen in case some module entry point call LoadLibrary.

example - loaded X[Y<W[ Z]>, Z]

//++begin load X
Y_DllMain
  //++begin load W
  W_DllMain
  //--end load W
Z_DllMain
X_DllMain
// --end load X

so W_DllMain will be called before Z_DllMain, despite W[Z]. exactly because this not recommended call LoadLibrary from DLL entry point.


but from Dynamic-Link Library Best Practices

This can cause a deadlock or a crash.

the words about deadlock not true - of course any deadlock can not be basically. where ? how ? we already hold loader lock inside DLL entry point and this lock can be acquired recursively. crash really can be (before win8).

or another false:

Call ExitThread. Exiting a thread during DLL detach can cause the loader lock to be acquired again, causing a deadlock or a crash.

  • can cause the loader lock to be acquired again - not can but always
  • causing a deadlock - false - we already hold this lock
  • a crash - no any crash will be, else one false

but which is really will be - thread exit without free loader lock. it became busy forever. as result any new thread creation or exit, any new DLL load or unload, or just ExitProcess call - hung, when try acquire loader lock. so deadlock here really will be, but not during Call ExitThread - latter.

and of course interesting note - the windows itself call LoadLibrary from DllMain - user32.dll always call LoadLibrary for imm32.dll from it entry point (still true and on win10)


but begin from win8 (or win8.1) loader became more smart on handle dependency modules. now 2 is changed

2. call entry points of new loaded (after X) modules or if module yet not initialized.

so in modern windows (8+) for load X[Y<W[Z]>, Z]

//++begin load X
Y_DllMain
  //++begin load W
  Z_DllMain
  W_DllMain
  //--end load W
X_DllMain
// -- end load X

the Z initialization will be moved to W load call graph. as result all will be correct now.

for test this we can build next solution: test.exe[ kernel32, D1< D2[kernel32, msvcrt] >, msvcrt ]

  • D2 import from kernel32 and msvcrt only and export SomeFunc
  • D1 import only from kernel32 and call LoadLibraryW(L"D2") from it entry point, and then call D2.SomeFunc
  • test.exe import from kernel32, D1 and msvcrt

(exactly in this order ! this is critical important - D1 must be before msvcrt in import, for this need set D1 before msvcrt in linker command line)

as result D1 entry point will be called before msvcrt. this is normal - D1 not depends on msvcrt but when D1 load D2 from it entry point, became interesting

code for D2.dll ( /NODEFAULTLIB kernel32.lib msvcrt.lib )

#include <Windows.h>

extern "C"
{
    __declspec(dllimport) int __cdecl sprintf(PSTR buf, PCSTR format, ...);
}

BOOLEAN WINAPI MyEp( HMODULE , DWORD ul_reason_for_call, PVOID )
{
    if (ul_reason_for_call == DLL_PROCESS_ATTACH)
    {
        OutputDebugStringA("D2.DllMain\n");
    }

    return TRUE;
}

INT_PTR WINAPI SomeFunc()
{
    __pragma(message(__FUNCDNAME__))
    char buf[32];
    // this is only for link to msvcrt.dll
    sprintf(buf, "D2.SomeFunc\n");
    OutputDebugStringA(buf);
    return 0;
}

#ifdef _WIN64
#define FuncName "?SomeFunc@@YA_JXZ"
#else
#define FuncName "?SomeFunc@@YGHXZ"
#endif

__pragma(comment(linker, "/export:" FuncName ",@1,NONAME,PRIVATE"))

code for D1.dll ( /NODEFAULTLIB kernel32.lib )

#include <Windows.h>

#pragma warning(disable : 4706)

BOOLEAN WINAPI MyEp( HMODULE hmod, DWORD ul_reason_for_call, PVOID )
{
    if (ul_reason_for_call == DLL_PROCESS_ATTACH)
    {
        OutputDebugStringA("D1.DllMain\n");
        if (hmod = LoadLibraryW(L"D2"))
        {
            if (FARPROC fp = GetProcAddress(hmod, (PCSTR)1))
            {
                fp();
            }
        }
    }

    return TRUE;
}

INT_PTR WINAPI SomeFunc()
{
    __pragma(message(__FUNCDNAME__))
    OutputDebugStringA("D1.SomeFunc\n");
    return 0;
}

#ifdef _WIN64
#define FuncName "?SomeFunc@@YA_JXZ"
#else
#define FuncName "?SomeFunc@@YGHXZ"
#endif

__pragma(comment(linker, "/export:" FuncName ",@1,NONAME"))

code for exe ( /NODEFAULTLIB kernel32.lib D1.lib msvcrt.lib )

#include <Windows.h>

extern "C"
{
    __declspec(dllimport) int __cdecl sprintf(PSTR buf, PCSTR format, ...);
}

__declspec(dllimport) INT_PTR WINAPI SomeFunc();

void ep()
{
    char buf[32];
    // this is only for link to msvcrt.dll
    sprintf(buf, "exe entry\n");
    OutputDebugStringA(buf);
    ExitProcess((UINT)SomeFunc());
}

output for xp:

LDR: D1.dll loaded - Calling init routine
D1.DllMain
Load: D2.dll
LDR: D2.dll loaded - Calling init routine
D2.DllMain
D2.SomeFunc
LDR: msvcrt.dll loaded - Calling init routine
exe entry
D1.SomeFunc

for win7:

LdrpRunInitializeRoutines - INFO: Calling init routine for DLL "D1.dll"
D1.DllMain
Load: D2.dll
LdrpRunInitializeRoutines - INFO: Calling init routine for DLL "D2.DLL"
D2.DllMain
D2.SomeFunc
LdrpRunInitializeRoutines - "msvcrt.dll"
exe entry
D1.SomeFunc

in both case call flow is the same - D2.DllMain called before msvcrt entry point, despite D2[msvcrt]

but on win8.1 and win10 - call flow is another:

LdrpInitializeNode - INFO: Calling init routine for DLL "D1.dll"
D1.DllMain
LdrpInitializeNode - INFO: Calling init routine for DLL "msvcrt.dll"
LdrpInitializeNode - INFO: Calling init routine for DLL "D2.DLL"
D2.DllMain
D2.SomeFunc
exe entry
D1.SomeFunc

the D2 entry point called after msvcrt initialization.

so what is conclusion?

if when module X[Y] is loaded and no not initialized Y in memory - Y_DllMain will be called before X_DllMain. or in another words - if nobody call LoadLibrary(X) (or LoadLibrary(Z[X]) ) from DLL entry point. so if your DLL will be loaded "normal" way (not by call LoadLibrary from DllMain or injected from driver on some dll load event) - you can be sure that crt entry point already called (crt initialized)

more - if you run on win8.1+ - and X[Y] is loaded - Y_DllMain will be always called before X_DllMain.


now about custom /ENTRYPOINT in your dll.

even if you use crt in separate DLLs - some small crt code will be statically linked to your module DllMainCRTStartup - which call your function DllMain (this is not a entry point) by name. so in case dynamic crt - we really have 2 crt parts - main part in separate DLLs and it will be initialized before your DLL entry point is called (if not special case which i describe higher and win7,vista,xp). and small static part (code inside your module). when this static part will be called already full depend from you. this part DllMainCRTStartup do some internal initializations, initialize global objects in your code (initterm) and call DllMain, after it return (on dll detach) call destructors for globals..

if you set custom entry point in DLL - at this point crt in separate DLLs already initialized, but your static crt no (as and global objects). from this custom entry point you will be need call DllMainCRTStartup

like image 144
RbMm Avatar answered Dec 04 '22 20:12

RbMm