Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ELF program header virtual address and file offset

I know the relationship between the two:

virtual address mod page alignment == file offset mod page alignment

But can someone tell me in which direction are these two numbers computed?

Is virtual address computed from file offset according to the relationship above, or vice versa?

Update

Here is some more detail: when the linker writes the ELF file header, it sets the virtual address and file offset of the program headers.(segments)

For example there's the output of readelf -l someELFfile:

Elf file type is EXEC (Executable file)
Entry point 0x8048094
Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x08048000 0x08048000 0x00154 0x00154 R E 0x1000
  LOAD           0x000154 0x08049154 0x08049154 0x00004 0x00004 RW  0x1000
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10

We can see 2 LOAD segments.

The virtual address of the first LOAD ends at 0x8048154, while the second LOAD starts at 0x8049154.

In the ELF file, the second LOAD is right behind the first LOAD with file offset 0x00154, however when this ELF is loaded into memory it starts at 0x1000 bytes after the end of the first LOAD segment.

But, why? If we have to consider memory page alignment, why doesn't the second LOAD segment starts at 0x80489000? Why does it start at 0x1000 bytes AFTER THE END of the first LOAD segment?

I know the virtual address of the second LOAD satisfies the relationship:

virtual address mod page alignment == file offset mod page alignment

But I don't know why this relationship must be satisfied.

like image 428
star.lit Avatar asked Mar 04 '17 18:03

star.lit


People also ask

What is offset in ELF?

The Global Offset Table, or GOT, is a section of a computer program's (executables and shared libraries) memory used to enable computer program code compiled as an ELF file to run correctly, independent of the memory address where the program's code or data is loaded at runtime.

What is an ELF program header?

ELF program headers are what describe segments within a binary and are necessary for program loading. Segments are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory.

What is ELF file section?

A section is the smallest unit of an object that can be relocated. Use the elfdump command to inspect the components of an object or executable file generated by the assembler. The following sections are commonly present in an ELF file: Section header. Executable text.

How do you get ELF headers?

To find them the ELF header is used, which is located at the very start of the file. The first bytes contain the elf magic "\x7fELF" , followed by the class ID (32 or 64 bit ELF file), the data format ID (little endian/big endian), the machine type, etc. Finally, the entry point of this file is at address 0x0.


1 Answers

Why does it start at 0x1000 bytes AFTER THE END of the first LOAD segment?

If it didn't, it would have to start at 0x08048154, but it can't: the two LOAD segments have different flags specified for their mapping (the first is mapped with PROT_READ|PROT_EXEC, the second with PROT_READ|PROTO_WRITE. Protections (being part of the page table) can only apply to whole pages, not parts of a page. Therefore, the mappings with different protections must belong to different pages.

virtual address mod page alignment == file offset mod page alignment
But I don't know why this relationship must be satisfied.

The LOAD segments are directly mmaped from file. The actual mapping of the second LOAD segment performed for your example will look something like this (you can run your program under strace and see that it does):

mmap(0x08049000, 0x158, PROT_READ|PROT_WRITE, MAP_PRIVATE, $fd, 0)

If you try to make the virtual address or the offset non-page-aligned, mmap will fail with EINVAL. The only way to make file data to appear in virtual memory at desired address it to make VirtAddr congruent to Offset modulo Align, and that is exactly what the static linker does.

Note that for such a small first LOAD segment, the entire first segment also appears at the beginning of the second mapping (with the wrong protections). But the program is not supposed to access anything in the [0x08049000,0x08049154) range. In general, it is almost always the case that there is some "junk" before the start of actual data in the second LOAD segment (unless you get really lucky and the first LOAD segment ends on a page boundary).

See also mmap man page.

like image 164
Employed Russian Avatar answered Sep 24 '22 13:09

Employed Russian