I have made this simple ELF for learning purposes:
bits 64
org 0x08048000
elfHeader:
db 0x7F, "ELF", 2, 1, 1, 0 ; e_ident
db 0 ; abi version
times 7 db 0 ; unused padding
dw 2 ; e_type
dw 62 ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq programHeader - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw elfHeaderSize ; e_ehsize
dw programHeaderSize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
elfHeaderSize equ $ - elfHeader
programHeader:
dd 1 ; p_type
dd 7 ; p_flags
dq 0 ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq fileSize ; p_filesz
dq fileSize ; p_memsz
dq 0x1000 ; p_align
programHeaderSize equ $ - programHeader
_start:
xor rdi, rdi
xor eax,eax
mov al,60
syscall
fileSize equ $ - $$
In order to compile that code I use NASM:
nasm -f bin exe.asm -o exe
If you take a look to the programHeader
, you will see that p_offset
is 0, and p_filesz
is fileSize
. That means that the segment contains the whole file. That's something I wasn't expecting(and I'm not the only one), but apparently the Linux operating system needs the headers to be in a segment of type PT_LOAD
so that information gets loaded.
This is the only resource I could find that mentions that fact that the headers are inside one segment: https://www.intezer.com/blog/research/executable-linkable-format-101-part1-sections-segments/
Something important to highlight about segments is that only PT_LOAD segments get loaded into memory. Therefore, every other segment is mapped within the memory range of one of the PT_LOAD segments.
In order to understand the relationship between Sections and Segments, we can picture segments as a tool to make the linux loader’s life easier, as they group sections by attributes into single segments in order to make the loading process of the executable more efficient, instead of loading each individual section into memory. The following diagram attempts to illustrate this concept:
But I don't understand why Linux needs that headers to be loaded at run time. What are they used for? If they are needed for the process to run, couldn't Linux load it by himself?
EDIT:
It has been mentioned in the comments that headers don't need to be loaded, however, they are sometimes loaded anyways to avoid having to add padding. I have tried adding padding to get it 4KB aligned but it didn't work. Here's my attempt:
bits 64
org 0x08048000
elfHeader:
db 0x7F, "ELF", 2, 1, 1, 0 ; e_ident
db 0 ; abi version
times 7 db 0 ; unused padding
dw 2 ; e_type
dw 62 ; e_machine
dd 1 ; e_version
dq _start ; e_entry
dq programHeader - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
dw elfHeaderSize ; e_ehsize
dw programHeaderSize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
elfHeaderSize equ $ - elfHeader
programHeader:
dd 1 ; p_type
dd 7 ; p_flags
dq _start - $$ ; p_offset
dq $$ ; p_vaddr
dq $$ ; p_paddr
dq codeSize ; p_filesz
dq codeSize ; p_memsz
dq 0x1000 ; p_align
programHeaderSize equ $ - programHeader
; padding until 4KB
paddingUntil4k equ 4*1024 - ($ - elfHeader)
times paddingUntil4k db 0
_start:
xor rdi, rdi
xor eax,eax
mov al,60
syscall
codeSize equ $ - _start
fileSize equ $ - $$
The segments contain information that is necessary for runtime execution of the file, while sections contain important data for linking and relocation. Any byte in the entire file can be owned by at most one section, and there can be orphan bytes which are not owned by any section.
ELF program headers are what describe segments within a binary and are necessary for program loading. Segments are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory.
Thus, a segment is almost exactly the same as a section, and the two can be used interchangeably. However, segment carries a secondary meaning and so can also be used when specifically talking about lines and planes in geometry. So you can use part whenever you want to talk about any piece of a larger entity.
In executable files, sections are optional, but it's nice to have them, because they describe what's in the file and allow for dumping selected parts of it (e.g. with the objdump tool). Sometimes they are needed, though, for storing dynamic linking information, symbol tables, debugging information, stuff like that.
But I don't understand why Linux needs that headers to be loaded at run time.
It doesn't.
What are they used for? If they are needed for the process to run, couldn't Linux load it by himself?
To answer all of these questions, you need to look at the Linux kernel source.
In the source, you can see that in fact program headers do not need to be a part of any PT_LOAD
segment, and that the kernel will read them all on its own.
Changing your original program like so:
diff -u exe.asm.orig exe.asm
--- exe.asm.orig 2021-02-07 18:54:34.449336515 -0800
+++ exe.asm 2021-02-07 18:53:19.773532451 -0800
@@ -24,9 +24,9 @@
programHeader:
dd 1 ; p_type
dd 7 ; p_flags
- dq 0 ; p_offset
- dq $$ ; p_vaddr
- dq $$ ; p_paddr
+ dq _start - $$ ; p_offset
+ dq _start ; p_vaddr
+ dq _start ; p_paddr
dq fileSize ; p_filesz
dq fileSize ; p_memsz
dq 0x1000 ; p_align
produces a program which runs fine, but in which the program header is not in the PT_LOAD
segment:
eu-readelf --all exe
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Ident Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: AMD x86-64
Version: 1 (current)
Entry point address: 0x8048078
Start of program headers: 64 (bytes into file)
Start of section headers: 0 (bytes into file)
Flags:
Size of this header: 64 (bytes)
Size of program header entries: 56 (bytes)
Number of program headers entries: 1
Size of section header entries: 0 (bytes)
Number of section headers entries: 0 ([0] not available)
Section header string table index: 0
Section Headers:
[Nr] Name Type Addr Off Size ES Flags Lk Inf Al
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000078 0x0000000008048078 0x0000000008048078 0x000081 0x000081 RWE 0x1000
I have tried adding padding
You didn't do that correctly. Using your "with padding" source results in the following exe-padding
:
...
Entry point address: 0x8049000
...
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x0000000008048000 0x0000000008048000 0x000009 0x000009 RWE 0x1000
This binary is started by the kernel, and immediately jumps to the start address 0x8049000
, which isn't mapped (since it's not covered by the PT_LOAD
segment), resulting in immediate SIGSEGV
.
To fix this, you need to adjust the entry address:
diff -u exe-padding.asm.orig exe-padding.asm
--- exe-padding.asm.orig 2021-02-07 18:57:31.800871195 -0800
+++ exe-padding.asm 2021-02-07 19:34:27.303071700 -0800
@@ -8,7 +8,7 @@
dw 2 ; e_type
dw 62 ; e_machine
dd 1 ; e_version
- dq _start ; e_entry
+ dq _start - 0x1000 ; e_entry
dq programHeader - $$ ; e_phoff
dq 0 ; e_shoff
dd 0 ; e_flags
This again produces a working executable. For the record:
eu-readelf --all exe-padding
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Ident Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: AMD x86-64
Version: 1 (current)
Entry point address: 0x8048000
Start of program headers: 64 (bytes into file)
Start of section headers: 0 (bytes into file)
Flags:
Size of this header: 64 (bytes)
Size of program header entries: 56 (bytes)
Number of program headers entries: 1
Size of section header entries: 0 (bytes)
Number of section headers entries: 0 ([0] not available)
Section header string table index: 0
Section Headers:
[Nr] Name Type Addr Off Size ES Flags Lk Inf Al
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x001000 0x0000000008048000 0x0000000008048000 0x000009 0x000009 RWE 0x1000
P.S. You are linking your 64-bit program at 0x08048000
, which is the traditional load address for i*86
(32-bit) executables. x86_64
binaries traditionally start at 0x400000
.
Update:
About the first example, p_filesz is still fileSize, I think that should get outside of the boundaries of the file.
That is correct: p_filesz
and p_memsz
should be reduced by the size of headers (0x78
here). Note that both of these will be rounded up to page size (after adding p_offset
), so for this example there is no practical difference.
Update 2:
pastebin.ubuntu.com/p/rgfVMrbcmJ
This results in the following LOAD
segment:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000078 0x0000000008048000 0x0000000008048000 0x000081 0x000081 RWE 0x1000
This binary will not run (kernel will reject it), because it is asking the kernel to do the impossible: to mmap
bytes at offset 0x78
to page start.
If the application performed equivalent mmap
call, it would have gotten EINVAL
error, because mmap
requires that (offset % pagesize) == (addr % pagesize)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With