Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How loader Maps DLL in to Process Address Space

Tags:

c++

c

dll

loader

I am curious to know How the Loader Maps DLL in to Process Address Space. How loader does that magic. Example is highly appreciated.

Thanks in advance.

like image 699
Mahesh Avatar asked Dec 03 '08 10:12

Mahesh


2 Answers

What level of detail are you looking for? On the basic level, all dynamic linkers work pretty much the same way:

  1. Dynamic libraries are compiled to relocatable code (using relative jumps instead of absolute, for example).
  2. The linker finds an appropriately-sized empty space in the memory map of the application, and reads the DLL's code and any static data into that space.
  3. The dynamic library contains a table of offsets to the start of each exported function, and calls to the DLL's functions in the client program are patched at load-time with a new destination address, based on where the library was loaded.
  4. Most dynamic linker systems have some system for setting a preferred base address for a particular library. If a library is loaded at its preferred address, then the relocation in steps 2 and 3 can be skipped.
like image 70
Mark Bessey Avatar answered Sep 28 '22 00:09

Mark Bessey


Okay, I'm assuming the Windows side of things here. What happens when you load a PE file is that the loader (contained in NTDLL) will do the following:

  1. Locate each of the DLLs using the DLL search semantics (system and patch-level specific), well-known DLLs are kind of exempt from this
  2. Map the file into memory (MMF), where pages are copy-on-write (CoW)
  3. Traverse the import directory and for each import start (recursively) at point 1.
  4. Resolve relocations, which most of the time is only a very limited number of entities, since the code itself is position-independent code (PIC)
  5. (IIRC) patch the EAT from RVA (relative virtual address) to VA (virtual address within current process memory space)
  6. Patch the IAT (import address table) to reference the imports with their actual address within the process memory space
  7. For a DLL call DLLMain() for an EXE create a thread whose start address is at the entry point of the PE file (this is also oversimplified, because the actual start address is inside kernel32.dll for Win32 processes)

Now when you compile code it depends on the linker how the external function is referenced. Some linkers create stubs so that - in theory - trying to check the function address against NULL will always say it's not NULL. It's a quirk you have to be aware of if and when your linker is affected. Others reference the IAT entry directly in which case an unreferenced function (think delay-loaded DLLs) address can be NULL and the SEH handler will then invoke the delay-load helper and (attempt to) resolve the function address, before resuming execution at the point it failed.

There is a lot of red tape involved in the above process which I oversimplified.

The gist for what you wanted to know is that the mapping into the process happens as an MMF, though you can artificially mimic the behavior with heap space. However, if you remember the point about CoW, that's the crux in the idea of DLLs. Actually the same copy of (most of) the pages of the DLL will be shared among the processes that load a particula DLL. The pages which are not shared are the ones that we wrote to, for example when resolving relocations and similar things. In this case each process has a - now modified - copy of the original page.

And a word of warning concerning EXE packers on DLL. They defeat exactly this CoW mechanism I described in that they allocate space for the unpacked contents of the DLL on the heap of the process into which the DLL is loaded. So while the actual file contents are still mapped as MMF and shared, the unpacked contents occupy the same amount of memory for each process loading the DLL instead of sharing that.

like image 40
0xC0000022L Avatar answered Sep 28 '22 01:09

0xC0000022L