Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why an ELF executable could have 4 LOAD segments?

There is a remote 64-bit *nix server that can compile a user-provided code (which should be written in Rust, but I don't think it matters since it uses LLVM). I don't know which compiler/linker flags it uses, but the compiled ELF executable looks weird - it has 4 LOAD segments:

$ readelf -e executable
...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000004138 0x0000000000004138  R      0x1000
  LOAD           0x0000000000005000 0x0000000000005000 0x0000000000005000
                 0x00000000000305e9 0x00000000000305e9  R E    0x1000
  LOAD           0x0000000000036000 0x0000000000036000 0x0000000000036000
                 0x000000000000d808 0x000000000000d808  R      0x1000
  LOAD           0x0000000000043da0 0x0000000000044da0 0x0000000000044da0
                 0x0000000000002290 0x00000000000024a0  RW     0x1000
...

On my own system all executables that I was looking at only have 2 LOAD segments:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000003000c0 0x00000000003000c0  R E    0x200000
  LOAD           0x00000000003002b0 0x00000000005002b0 0x00000000005002b0
                 0x00000000000776c8 0x000000000009b200  RW     0x200000
...
  1. What are the circumstances (compiler/linker versions, flags etc) under which a compiler might build an ELF with 4 LOAD segments?
  2. What is the point of having 4 LOAD segments? I imagine that having a segment with read but not execute permission might help against certain exploits, but why have two such segments?
like image 913
kreo Avatar asked Sep 02 '19 17:09

kreo


People also ask

Why does ELF differentiate between segments and sections of an executable?

The segments contain information that is necessary for runtime execution of the file, while sections contain important data for linking and relocation. Any byte in the entire file can be owned by at most one section, and there can be orphan bytes which are not owned by any section.

What are segments of ELF file?

Segments, which are commonly known as Program Headers, break down the structure of an ELF binary into suitable chunks to prepare the executable to be loaded into memory. In contrast with Section Headers, Program Headers are not needed on linktime.

How is an ELF file loaded?

ELF files are used by two tools: the linker and the loader. A linker combines multiple ELF files into an executable or a library and a loader loads the executable ELF file in the memory of the process.

What does an ELF file contains?

An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary.


1 Answers

A typical BFD-ld or Gold linked Linux executable has 2 loadable segments, with the ELF header merged with .text and .rodata into the first RE segment, and .data, .bss and other writable sections merged into the second RW segment.

Here is the typical section to segment mapping:

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-gold -fuse-ld=gold
$ readelf -Wl a.out-gold

Elf file type is EXEC (Executable file)
Entry point 0x400420
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R   0x8
  INTERP         0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0006b0 0x0006b0 R E 0x1000
  LOAD           0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001f8 0x000200 RW  0x1000
  DYNAMIC        0x000e28 0x0000000000401e28 0x0000000000401e28 0x0001b0 0x0001b0 RW  0x8
  NOTE           0x000254 0x0000000000400254 0x0000000000400254 0x000020 0x000020 R   0x4
  GNU_EH_FRAME   0x00067c 0x000000000040067c 0x000000000040067c 0x000034 0x000034 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001e8 0x0001e8 RW  0x8

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .dynsym .dynstr .gnu.hash .hash .gnu.version .gnu.version_r .rela.dyn .init .text .fini .rodata .eh_frame .eh_frame_hdr
   03     .fini_array .init_array .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag
   06     .eh_frame_hdr
   07
   08     .fini_array .init_array .dynamic .got .got.plt

This optimizes the number of mmaps that the kernel must perform to load such executable, but at a security cost: the data in .rodata shouldn't be executable, but is (because it's merged with .text, which must be executable). This may significantly increase the attack surface for someone trying to hijack a process.

Newer Linux systems, in particular using LLD to link binaries, prioritize security over speed, and put ELF header and .rodata into the first R-only segment, resulting in 3 load segments and improved security. Here is a typical mapping:

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-lld -fuse-ld=lld
$ readelf -Wl a.out-lld

Elf file type is EXEC (Executable file)
Entry point 0x201000
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x000230 0x000230 R   0x8
  INTERP         0x000270 0x0000000000200270 0x0000000000200270 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000558 0x000558 R   0x1000
  LOAD           0x001000 0x0000000000201000 0x0000000000201000 0x000185 0x000185 R E 0x1000
  LOAD           0x002000 0x0000000000202000 0x0000000000202000 0x001170 0x002005 RW  0x1000
  DYNAMIC        0x003010 0x0000000000203010 0x0000000000203010 0x000150 0x000150 RW  0x8
  GNU_RELRO      0x003000 0x0000000000203000 0x0000000000203000 0x000170 0x001000 R   0x1
  GNU_EH_FRAME   0x000440 0x0000000000200440 0x0000000000200440 0x000034 0x000034 R   0x1
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x00028c 0x000000000020028c 0x000000000020028c 0x000020 0x000020 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .rodata .dynsym .gnu.version .gnu.version_r .gnu.hash .hash .dynstr .rela.dyn .eh_frame_hdr .eh_frame
   03     .text .init .fini
   04     .data .tm_clone_table .fini_array .init_array .dynamic .got .bss
   05     .dynamic
   06     .fini_array .init_array .dynamic .got
   07     .eh_frame_hdr
   08
   09     .note.ABI-tag

Not to be left behind, the newer BFD-ld (my version is 2.31.1) also makes ELF header and .rodata read-only, but fails to merge two R-only segments into one, resulting in 4 loadable segments:

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-bfd -fuse-ld=bfd
$ readelf -Wl a.out-bfd

Elf file type is EXEC (Executable file)
Entry point 0x401020
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x000268 0x000268 R   0x8
  INTERP         0x0002a8 0x00000000004002a8 0x00000000004002a8 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0003f8 0x0003f8 R   0x1000
  LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x00018d 0x00018d R E 0x1000
  LOAD           0x002000 0x0000000000402000 0x0000000000402000 0x000110 0x000110 R   0x1000
  LOAD           0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001e8 0x0001f0 RW  0x1000
  DYNAMIC        0x002e50 0x0000000000403e50 0x0000000000403e50 0x0001a0 0x0001a0 RW  0x8
  NOTE           0x0002c4 0x00000000004002c4 0x00000000004002c4 0x000020 0x000020 R   0x4
  GNU_EH_FRAME   0x002004 0x0000000000402004 0x0000000000402004 0x000034 0x000034 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001c0 0x0001c0 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn
   03     .init .text .fini
   04     .rodata .eh_frame_hdr .eh_frame
   05     .init_array .fini_array .dynamic .got .got.plt .data .bss
   06     .dynamic
   07     .note.ABI-tag
   08     .eh_frame_hdr
   09
   10     .init_array .fini_array .dynamic .got

Finally, some of these choices are affected by the --(no)rosegment (or -Wl,z,noseparate-code for BFD ld) linker option.

like image 130
Employed Russian Avatar answered Oct 12 '22 16:10

Employed Russian