Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In ELF, why do the headers need to be in one segment?

I have made this simple ELF for learning purposes:

bits 64
org 0x08048000

elfHeader:
    db  0x7F, "ELF", 2, 1, 1, 0   ; e_ident
    db 0                            ; abi version
    times 7 db 0                    ; unused padding
    dw  2                         ; e_type
    dw  62                        ; e_machine
    dd  1                         ; e_version
    dq  _start                    ; e_entry
    dq  programHeader - $$        ; e_phoff
    dq  0                         ; e_shoff
    dd  0                         ; e_flags
    dw  elfHeaderSize             ; e_ehsize
    dw  programHeaderSize         ; e_phentsize
    dw  1                         ; e_phnum
    dw  0                         ; e_shentsize
    dw  0                         ; e_shnum
    dw  0                         ; e_shstrndx

elfHeaderSize  equ $ - elfHeader

programHeader:
    dd  1                         ; p_type
    dd  7                         ; p_flags
    dq  0                         ; p_offset
    dq  $$                        ; p_vaddr
    dq  $$                        ; p_paddr
    dq  fileSize                  ; p_filesz
    dq  fileSize                  ; p_memsz
    dq  0x1000                    ; p_align

programHeaderSize equ  $ - programHeader

_start:
   xor rdi, rdi
   xor eax,eax
   mov al,60
   syscall

fileSize      equ     $ - $$

In order to compile that code I use NASM:

nasm -f bin exe.asm -o exe

If you take a look to the programHeader, you will see that p_offset is 0, and p_filesz is fileSize. That means that the segment contains the whole file. That's something I wasn't expecting(and I'm not the only one), but apparently the Linux operating system needs the headers to be in a segment of type PT_LOAD so that information gets loaded.

This is the only resource I could find that mentions that fact that the headers are inside one segment: https://www.intezer.com/blog/research/executable-linkable-format-101-part1-sections-segments/

Something important to highlight about segments is that only PT_LOAD segments get loaded into memory. Therefore, every other segment is mapped within the memory range of one of the PT_LOAD segments.

In order to understand the relationship between Sections and Segments, we can picture segments as a tool to make the linux loader’s life easier, as they group sections by attributes into single segments in order to make the loading process of the executable more efficient, instead of loading each individual section into memory. The following diagram attempts to illustrate this concept:

enter image description here

But I don't understand why Linux needs that headers to be loaded at run time. What are they used for? If they are needed for the process to run, couldn't Linux load it by himself?

EDIT:

It has been mentioned in the comments that headers don't need to be loaded, however, they are sometimes loaded anyways to avoid having to add padding. I have tried adding padding to get it 4KB aligned but it didn't work. Here's my attempt:

bits 64
org 0x08048000

elfHeader:
    db  0x7F, "ELF", 2, 1, 1, 0   ; e_ident
    db 0                            ; abi version
    times 7 db 0                    ; unused padding
    dw  2                         ; e_type
    dw  62                        ; e_machine
    dd  1                         ; e_version
    dq  _start                    ; e_entry
    dq  programHeader - $$        ; e_phoff
    dq  0                         ; e_shoff
    dd  0                         ; e_flags
    dw  elfHeaderSize             ; e_ehsize
    dw  programHeaderSize         ; e_phentsize
    dw  1                         ; e_phnum
    dw  0                         ; e_shentsize
    dw  0                         ; e_shnum
    dw  0                         ; e_shstrndx

elfHeaderSize  equ $ - elfHeader

programHeader:
    dd  1                         ; p_type
    dd  7                         ; p_flags
    dq  _start - $$               ; p_offset
    dq  $$                        ; p_vaddr
    dq  $$                        ; p_paddr
    dq  codeSize                  ; p_filesz
    dq  codeSize                  ; p_memsz
    dq  0x1000                    ; p_align

programHeaderSize equ  $ - programHeader

; padding until 4KB
paddingUntil4k equ 4*1024 - ($ - elfHeader)
times paddingUntil4k db 0


_start:
   xor rdi, rdi
   xor eax,eax
   mov al,60
   syscall

codeSize equ $ - _start
fileSize equ $ - $$
like image 645
tuket Avatar asked Feb 03 '21 20:02

tuket


People also ask

What is a segment in an ELF file?

The segments contain information that is necessary for runtime execution of the file, while sections contain important data for linking and relocation. Any byte in the entire file can be owned by at most one section, and there can be orphan bytes which are not owned by any section.

What are program headers in ELF?

ELF program headers are what describe segments within a binary and are necessary for program loading. Segments are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory.

What is the difference between segments and sections?

Thus, a segment is almost exactly the same as a section, and the two can be used interchangeably. However, segment carries a secondary meaning and so can also be used when specifically talking about lines and planes in geometry. So you can use part whenever you want to talk about any piece of a larger entity.

Why is ELF needed?

In executable files, sections are optional, but it's nice to have them, because they describe what's in the file and allow for dumping selected parts of it (e.g. with the objdump tool). Sometimes they are needed, though, for storing dynamic linking information, symbol tables, debugging information, stuff like that.


1 Answers

But I don't understand why Linux needs that headers to be loaded at run time.

It doesn't.

What are they used for? If they are needed for the process to run, couldn't Linux load it by himself?

To answer all of these questions, you need to look at the Linux kernel source.

In the source, you can see that in fact program headers do not need to be a part of any PT_LOAD segment, and that the kernel will read them all on its own.

Changing your original program like so:

diff -u exe.asm.orig exe.asm
--- exe.asm.orig        2021-02-07 18:54:34.449336515 -0800
+++ exe.asm     2021-02-07 18:53:19.773532451 -0800
@@ -24,9 +24,9 @@
 programHeader:
     dd  1                         ; p_type
     dd  7                         ; p_flags
-    dq  0                         ; p_offset
-    dq  $$                        ; p_vaddr
-    dq  $$                        ; p_paddr
+    dq  _start - $$               ; p_offset
+    dq  _start                    ; p_vaddr
+    dq  _start                    ; p_paddr
     dq  fileSize                  ; p_filesz
     dq  fileSize                  ; p_memsz
     dq  0x1000                    ; p_align

produces a program which runs fine, but in which the program header is not in the PT_LOAD segment:

 eu-readelf --all exe
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           AMD x86-64
  Version:                           1 (current)
  Entry point address:               0x8048078
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:
  Size of this header:               64 (bytes)
  Size of program header entries:    56 (bytes)
  Number of program headers entries: 1
  Size of section header entries:    0 (bytes)
  Number of section headers entries: 0 ([0] not available)
  Section header string table index: 0

Section Headers:
[Nr] Name                 Type         Addr             Off      Size     ES Flags Lk Inf Al

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000008048078 0x0000000008048078 0x000081 0x000081 RWE 0x1000

I have tried adding padding

You didn't do that correctly. Using your "with padding" source results in the following exe-padding:

...
  Entry point address:               0x8049000
...
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x001000 0x0000000008048000 0x0000000008048000 0x000009 0x000009 RWE 0x1000

This binary is started by the kernel, and immediately jumps to the start address 0x8049000, which isn't mapped (since it's not covered by the PT_LOAD segment), resulting in immediate SIGSEGV.

To fix this, you need to adjust the entry address:

diff -u exe-padding.asm.orig exe-padding.asm
--- exe-padding.asm.orig        2021-02-07 18:57:31.800871195 -0800
+++ exe-padding.asm     2021-02-07 19:34:27.303071700 -0800
@@ -8,7 +8,7 @@
     dw  2                         ; e_type
     dw  62                        ; e_machine
     dd  1                         ; e_version
-    dq  _start                    ; e_entry
+    dq  _start - 0x1000           ; e_entry
     dq  programHeader - $$        ; e_phoff
     dq  0                         ; e_shoff
     dd  0                         ; e_flags

This again produces a working executable. For the record:

eu-readelf --all exe-padding
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           AMD x86-64
  Version:                           1 (current)
  Entry point address:               0x8048000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             
  Size of this header:               64 (bytes)
  Size of program header entries:    56 (bytes)
  Number of program headers entries: 1
  Size of section header entries:    0 (bytes)
  Number of section headers entries: 0 ([0] not available)
  Section header string table index: 0

Section Headers:
[Nr] Name                 Type         Addr             Off      Size     ES Flags Lk Inf Al

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x001000 0x0000000008048000 0x0000000008048000 0x000009 0x000009 RWE 0x1000

P.S. You are linking your 64-bit program at 0x08048000, which is the traditional load address for i*86 (32-bit) executables. x86_64 binaries traditionally start at 0x400000.

Update:

About the first example, p_filesz is still fileSize, I think that should get outside of the boundaries of the file.

That is correct: p_filesz and p_memsz should be reduced by the size of headers (0x78 here). Note that both of these will be rounded up to page size (after adding p_offset), so for this example there is no practical difference.

Update 2:

pastebin.ubuntu.com/p/rgfVMrbcmJ

This results in the following LOAD segment:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000008048000 0x0000000008048000 0x000081 0x000081 RWE 0x1000

This binary will not run (kernel will reject it), because it is asking the kernel to do the impossible: to mmap bytes at offset 0x78 to page start.

If the application performed equivalent mmap call, it would have gotten EINVAL error, because mmap requires that (offset % pagesize) == (addr % pagesize).

like image 192
Employed Russian Avatar answered Sep 28 '22 09:09

Employed Russian