In ELF, why do the headers need to be in one segment?

Tags:

I have made this simple ELF for learning purposes:

bits 64
org 0x08048000

elfHeader:
    db  0x7F, "ELF", 2, 1, 1, 0   ; e_ident
    db 0                            ; abi version
    times 7 db 0                    ; unused padding
    dw  2                         ; e_type
    dw  62                        ; e_machine
    dd  1                         ; e_version
    dq  _start                    ; e_entry
    dq  programHeader - $$        ; e_phoff
    dq  0                         ; e_shoff
    dd  0                         ; e_flags
    dw  elfHeaderSize             ; e_ehsize
    dw  programHeaderSize         ; e_phentsize
    dw  1                         ; e_phnum
    dw  0                         ; e_shentsize
    dw  0                         ; e_shnum
    dw  0                         ; e_shstrndx

elfHeaderSize  equ $ - elfHeader

programHeader:
    dd  1                         ; p_type
    dd  7                         ; p_flags
    dq  0                         ; p_offset
    dq  $$                        ; p_vaddr
    dq  $$                        ; p_paddr
    dq  fileSize                  ; p_filesz
    dq  fileSize                  ; p_memsz
    dq  0x1000                    ; p_align

programHeaderSize equ  $ - programHeader

_start:
   xor rdi, rdi
   xor eax,eax
   mov al,60
   syscall

fileSize      equ     $ - $$

In order to compile that code I use NASM:

nasm -f bin exe.asm -o exe

If you take a look to the programHeader, you will see that p_offset is 0, and p_filesz is fileSize. That means that the segment contains the whole file. That's something I wasn't expecting(and I'm not the only one), but apparently the Linux operating system needs the headers to be in a segment of type PT_LOAD so that information gets loaded.

This is the only resource I could find that mentions that fact that the headers are inside one segment: https://www.intezer.com/blog/research/executable-linkable-format-101-part1-sections-segments/

Something important to highlight about segments is that only PT_LOAD segments get loaded into memory. Therefore, every other segment is mapped within the memory range of one of the PT_LOAD segments.

In order to understand the relationship between Sections and Segments, we can picture segments as a tool to make the linux loader’s life easier, as they group sections by attributes into single segments in order to make the loading process of the executable more efficient, instead of loading each individual section into memory. The following diagram attempts to illustrate this concept:

But I don't understand why Linux needs that headers to be loaded at run time. What are they used for? If they are needed for the process to run, couldn't Linux load it by himself?

EDIT:

It has been mentioned in the comments that headers don't need to be loaded, however, they are sometimes loaded anyways to avoid having to add padding. I have tried adding padding to get it 4KB aligned but it didn't work. Here's my attempt:

bits 64
org 0x08048000

elfHeader:
    db  0x7F, "ELF", 2, 1, 1, 0   ; e_ident
    db 0                            ; abi version
    times 7 db 0                    ; unused padding
    dw  2                         ; e_type
    dw  62                        ; e_machine
    dd  1                         ; e_version
    dq  _start                    ; e_entry
    dq  programHeader - $$        ; e_phoff
    dq  0                         ; e_shoff
    dd  0                         ; e_flags
    dw  elfHeaderSize             ; e_ehsize
    dw  programHeaderSize         ; e_phentsize
    dw  1                         ; e_phnum
    dw  0                         ; e_shentsize
    dw  0                         ; e_shnum
    dw  0                         ; e_shstrndx

elfHeaderSize  equ $ - elfHeader

programHeader:
    dd  1                         ; p_type
    dd  7                         ; p_flags
    dq  _start - $$               ; p_offset
    dq  $$                        ; p_vaddr
    dq  $$                        ; p_paddr
    dq  codeSize                  ; p_filesz
    dq  codeSize                  ; p_memsz
    dq  0x1000                    ; p_align

programHeaderSize equ  $ - programHeader

; padding until 4KB
paddingUntil4k equ 4*1024 - ($ - elfHeader)
times paddingUntil4k db 0


_start:
   xor rdi, rdi
   xor eax,eax
   mov al,60
   syscall

codeSize equ $ - _start
fileSize equ $ - $$

645

asked Feb 03 '21 20:02

tuket

1 Answers

But I don't understand why Linux needs that headers to be loaded at run time.

It doesn't.

What are they used for? If they are needed for the process to run, couldn't Linux load it by himself?

To answer all of these questions, you need to look at the Linux kernel source.

In the source, you can see that in fact program headers do not need to be a part of any PT_LOAD segment, and that the kernel will read them all on its own.

Changing your original program like so:

diff -u exe.asm.orig exe.asm
--- exe.asm.orig        2021-02-07 18:54:34.449336515 -0800
+++ exe.asm     2021-02-07 18:53:19.773532451 -0800
@@ -24,9 +24,9 @@
 programHeader:
     dd  1                         ; p_type
     dd  7                         ; p_flags
-    dq  0                         ; p_offset
-    dq  $$                        ; p_vaddr
-    dq  $$                        ; p_paddr
+    dq  _start - $$               ; p_offset
+    dq  _start                    ; p_vaddr
+    dq  _start                    ; p_paddr
     dq  fileSize                  ; p_filesz
     dq  fileSize                  ; p_memsz
     dq  0x1000                    ; p_align

produces a program which runs fine, but in which the program header is not in the PT_LOAD segment:

 eu-readelf --all exe
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           AMD x86-64
  Version:                           1 (current)
  Entry point address:               0x8048078
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:
  Size of this header:               64 (bytes)
  Size of program header entries:    56 (bytes)
  Number of program headers entries: 1
  Size of section header entries:    0 (bytes)
  Number of section headers entries: 0 ([0] not available)
  Section header string table index: 0

Section Headers:
[Nr] Name                 Type         Addr             Off      Size     ES Flags Lk Inf Al

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000008048078 0x0000000008048078 0x000081 0x000081 RWE 0x1000

I have tried adding padding

You didn't do that correctly. Using your "with padding" source results in the following exe-padding:

...
  Entry point address:               0x8049000
...
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x001000 0x0000000008048000 0x0000000008048000 0x000009 0x000009 RWE 0x1000

This binary is started by the kernel, and immediately jumps to the start address 0x8049000, which isn't mapped (since it's not covered by the PT_LOAD segment), resulting in immediate SIGSEGV.

To fix this, you need to adjust the entry address:

diff -u exe-padding.asm.orig exe-padding.asm
--- exe-padding.asm.orig        2021-02-07 18:57:31.800871195 -0800
+++ exe-padding.asm     2021-02-07 19:34:27.303071700 -0800
@@ -8,7 +8,7 @@
     dw  2                         ; e_type
     dw  62                        ; e_machine
     dd  1                         ; e_version
-    dq  _start                    ; e_entry
+    dq  _start - 0x1000           ; e_entry
     dq  programHeader - $$        ; e_phoff
     dq  0                         ; e_shoff
     dd  0                         ; e_flags

This again produces a working executable. For the record:

eu-readelf --all exe-padding
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Ident Version:                     1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           AMD x86-64
  Version:                           1 (current)
  Entry point address:               0x8048000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             
  Size of this header:               64 (bytes)
  Size of program header entries:    56 (bytes)
  Number of program headers entries: 1
  Size of section header entries:    0 (bytes)
  Number of section headers entries: 0 ([0] not available)
  Section header string table index: 0

Section Headers:
[Nr] Name                 Type         Addr             Off      Size     ES Flags Lk Inf Al

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x001000 0x0000000008048000 0x0000000008048000 0x000009 0x000009 RWE 0x1000

P.S. You are linking your 64-bit program at 0x08048000, which is the traditional load address for i*86 (32-bit) executables. x86_64 binaries traditionally start at 0x400000.

Update:

About the first example, p_filesz is still fileSize, I think that should get outside of the boundaries of the file.

That is correct: p_filesz and p_memsz should be reduced by the size of headers (0x78 here). Note that both of these will be rounded up to page size (after adding p_offset), so for this example there is no practical difference.

Update 2:

pastebin.ubuntu.com/p/rgfVMrbcmJ

This results in the following LOAD segment:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000078 0x0000000008048000 0x0000000008048000 0x000081 0x000081 RWE 0x1000

This binary will not run (kernel will reject it), because it is asking the kernel to do the impossible: to mmap bytes at offset 0x78 to page start.

If the application performed equivalent mmap call, it would have gotten EINVAL error, because mmap requires that (offset % pagesize) == (addr % pagesize).

192

answered Sep 28 '22 09:09

Employed Russian

Related questions
                            
                                Access Tensorflow from Tomcat on CentOS Linux
                            
                                How can I execute parallel "for" loops in Bash?
                            
                                Deploy a C# Stateful Service Fabric application from Visual Studio to Linux
                            
                                How to check character encoding of a file in Linux
                            
                                Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y
                            
                                Unable to locate package linux-headers-4.15.0-kali2-amd64
                            
                                How to install postgres10 tools on Amazon Linux AMI
                            
                                wget now automatically redirecting output to a log file, how to return back to putting output below
                            
                                How to synchronize code files on windows with WSL/linux?
                            
                                Possibility of trolling inside npm of node.js script
                            
                                Why golang clone syscall abi is diffent from linux kernel clone on x86-64
                            
                                Flask, Nginx, Gunicorn Stack Launching Selenium instance
                            
                                systemd adding service into multi-user.target.wants folder only works as a symlink [closed]
                            
                                How do I Represent Success Status Codes in Linux
                            
                                Why does it say "We must not include limits.h!" in dirent.h?
                            
                                Is it possible to allocate large amount of virtual memory in linux?
                            
                                Could not increase number of max_open_files to more than 4096 (request: 4214)
                            
                                ImportError: matplotlib is required for plotting when the default backend "matplotlib" is selected
                            
                                Snapd keeps running, causing jbd2/sda2-8 accessing the disk with no read or write, consuming lots of io and system load
                            
                                How can I access my WSL2 files from my natively installed Ubuntu?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In ELF, why do the headers need to be in one segment?

Tags:

linux

assembly

elf

tuket

People also ask

1 Answers

Employed Russian

Recent Activity

Donate For Us