I know the relationship between the two:
virtual address mod page alignment == file offset mod page alignment
But can someone tell me in which direction are these two numbers computed?
Is virtual address computed from file offset according to the relationship above, or vice versa?
Here is some more detail: when the linker writes the ELF file header, it sets the virtual address and file offset of the program headers.(segments)
For example there's the output of readelf -l someELFfile
:
Elf file type is EXEC (Executable file)
Entry point 0x8048094
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x00154 0x00154 R E 0x1000
LOAD 0x000154 0x08049154 0x08049154 0x00004 0x00004 RW 0x1000
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
We can see 2 LOAD segments.
The virtual address of the first LOAD ends at 0x8048154, while the second LOAD starts at 0x8049154.
In the ELF file, the second LOAD is right behind the first LOAD with file offset 0x00154, however when this ELF is loaded into memory it starts at 0x1000 bytes after the end of the first LOAD segment.
But, why? If we have to consider memory page alignment, why doesn't the second LOAD segment starts at 0x80489000? Why does it start at 0x1000 bytes AFTER THE END of the first LOAD segment?
I know the virtual address of the second LOAD satisfies the relationship:
virtual address mod page alignment == file offset mod page alignment
But I don't know why this relationship must be satisfied.
The Global Offset Table, or GOT, is a section of a computer program's (executables and shared libraries) memory used to enable computer program code compiled as an ELF file to run correctly, independent of the memory address where the program's code or data is loaded at runtime.
ELF program headers are what describe segments within a binary and are necessary for program loading. Segments are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory.
A section is the smallest unit of an object that can be relocated. Use the elfdump command to inspect the components of an object or executable file generated by the assembler. The following sections are commonly present in an ELF file: Section header. Executable text.
To find them the ELF header is used, which is located at the very start of the file. The first bytes contain the elf magic "\x7fELF" , followed by the class ID (32 or 64 bit ELF file), the data format ID (little endian/big endian), the machine type, etc. Finally, the entry point of this file is at address 0x0.
Why does it start at 0x1000 bytes AFTER THE END of the first LOAD segment?
If it didn't, it would have to start at 0x08048154
, but it can't: the two LOAD
segments have different flags specified for their mapping (the first is mapped with PROT_READ|PROT_EXEC
, the second with PROT_READ|PROTO_WRITE
. Protections (being part of the page table) can only apply to whole pages, not parts of a page. Therefore, the mappings with different protections must belong to different pages.
virtual address mod page alignment == file offset mod page alignment
But I don't know why this relationship must be satisfied.
The LOAD
segments are directly mmap
ed from file. The actual mapping of the second LOAD
segment performed for your example will look something like this (you can run your program under strace
and see that it does):
mmap(0x08049000, 0x158, PROT_READ|PROT_WRITE, MAP_PRIVATE, $fd, 0)
If you try to make the virtual address or the offset non-page-aligned, mmap
will fail with EINVAL
. The only way to make file data to appear in virtual memory at desired address it to make VirtAddr
congruent to Offset
modulo Align
, and that is exactly what the static linker does.
Note that for such a small first LOAD
segment, the entire first segment also appears at the beginning of the second mapping (with the wrong protections). But the program is not supposed to access anything in the [0x08049000,0x08049154)
range. In general, it is almost always the case that there is some "junk" before the start of actual data in the second LOAD
segment (unless you get really lucky and the first LOAD
segment ends on a page boundary).
See also mmap man page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With