Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Base Address of ELF

Tags:

linux

elf

I am trying to find the base address of ELF files. I know that you can use readelf to find the Program Entry Point and different section details (base address, size, flags and so on).

For example, programs for x86 architecture are based at 0x8048000 by linker. using readelf I can see the program entry point but no specific field in the output tells the base address.

$ readelf -e test
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8048390
  Start of program headers:          52 (bytes into file)
  Start of section headers:          4436 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         9
  Size of section headers:           40 (bytes)
  Number of section headers:         30

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        08048154 000154 000013 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            08048168 000168 000020 00   A  0   0  4
  [ 3] .note.gnu.build-i NOTE            08048188 000188 000024 00   A  0   0  4
  [ 4] .gnu.hash         GNU_HASH        080481ac 0001ac 000024 04   A  5   0  4
  [ 5] .dynsym           DYNSYM          080481d0 0001d0 000070 10   A  6   1  4

In the section details, I can see that the Offset is calculated with respect to the base address of the ELF.

So, .dynsym section starts at address, 0x080481d0 and offset is 0x1d0. This would mean the base address is, 0x08048000. Is this correct?

similarly, for programs compiled on different architectures like PPC, ARM, MIPS, I cannot see their base address but only the OEP, Section Headers.

like image 430
Neon Flash Avatar asked Aug 18 '13 06:08

Neon Flash


3 Answers

You need to check the segment table aka program headers (readelf -l).

Elf file type is EXEC (Executable file)
Entry point 0x804a7a0
There are 9 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00120 0x00120 R E 0x4
  INTERP         0x000154 0x08048154 0x08048154 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x10fc8 0x10fc8 R E 0x1000
  LOAD           0x011000 0x08059000 0x08059000 0x0038c 0x01700 RW  0x1000
  DYNAMIC        0x01102c 0x0805902c 0x0805902c 0x000f8 0x000f8 RW  0x4
  NOTE           0x000168 0x08048168 0x08048168 0x00020 0x00020 R   0x4
  TLS            0x011000 0x08059000 0x08059000 0x00000 0x0005c R   0x4
  GNU_EH_FRAME   0x00d3c0 0x080553c0 0x080553c0 0x00c5c 0x00c5c R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4

The first (lowest) LOAD segment's virtual address is the default load base of the file. You can see it's 0x08048000 for this file.

like image 106
Igor Skochinsky Avatar answered Oct 20 '22 05:10

Igor Skochinsky


The ELF mapping base Address of the .text section is defined by the ld(1) loader script in the binutils project under script template elf.sc on Linux.

The script define the following variables used by the loader ld(1):

#       TEXT_START_ADDR - the first byte of the text segment, after any
#               headers.
#       TEXT_BASE_ADDRESS - the first byte of the text segment.
#       TEXT_START_SYMBOLS - symbols that appear at the start of the
#               .text section.

You can inspect the current values using the command:

~$ ld --verbose |grep SEGMENT_START
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  . = SEGMENT_START("ldata-segment", .);

The text-segment mapping values are:

  • 0x08048000 on 32 Bits
  • 0x400000 on 64 Bits

Also the interpreter base address of an ELF program is defined in the Auxiliary vector array at the index AT_BASE. The Auxiliary vector array is an array of the Elf_auxv_t structure and located after the envp in the process stack. It's configured while loading the ELF binary in the function create_elf_tables() of Linux kernel fs/binfmt_elf.c. The following code snippet show how to read the value:

$ cat at_base.c
#include <stdio.h>                                                              
#include <elf.h>                                                                

int                                                                             
main(int argc, char* argv[], char* envp[])                                      
{                                                                               
        Elf64_auxv_t *auxp;                                                      
        while(*envp++ != NULL);                                                 

        for (auxp = (Elf64_auxv_t *)envp; auxp->a_type != 0; auxp++) {
            if (auxp->a_type == 7) {
                printf("AT_BASE: 0x%lx\n", auxp->a_un.a_val);
            }
        }

}
$ clang -o at_base at_base.c
$ ./at_base
AT_BASE: 0x7fcfd4025000

Linux Auxiliary Vector definition Auxiliary Vector Reference

It used to be a fixed address on x86 32 bits architecture, but with ASLR now, it's randomized. You can use setarch i386 -R to disable randomization if you want.

like image 22
sbz Avatar answered Oct 20 '22 04:10

sbz


It's defined in the linker script. You can dump the default linker script with ld --verbose. Example output:

GNU ld (GNU Binutils) 2.23.1
  Supported emulations:
   elf_x86_64
   elf32_x86_64
   elf_i386
   i386linux
   elf_l1om
   elf_k1om
using internal linker script:
==================================================
/* Script for -z combreloc: combine and sort reloc sections */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
          "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("/nix/store/kxf1p7l7lgm6j5mjzkiwcwzc98s9f1az-binutils-2.23.1/x86_64-unknown-linux-gnu/lib64"); SEARCH_DIR("/nix/store/kxf1p7l7lgm6j5mjzkiwcwzc98s9f1az-binutils-2.23.1/lib64"); SEARCH_DIR("/nix/store/kxf1p7l7lgm6j5mjzkiwcwzc98s9f1az-binutils-2.23.1/x86_64-unknown-linux-gnu/lib"); SEARCH_DIR("/nix/store/kxf1p7l7lgm6j5mjzkiwcwzc98s9f1az-binutils-2.23.1/lib");
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  .interp         : { *(.interp) }
  .note.gnu.build-id : { *(.note.gnu.build-id) }
  .hash           : { *(.hash) }
  .gnu.hash       : { *(.gnu.hash) }
  .dynsym         : { *(.dynsym) }
  .dynstr         : { *(.dynstr) }
  .gnu.version    : { *(.gnu.version) }
  .gnu.version_d  : { *(.gnu.version_d) }
  .gnu.version_r  : { *(.gnu.version_r) }

(snip)

In case you missed it: __executable_start = SEGMENT_START("text-segment", 0x400000)).

And for me, sure enough, when I link a simple .o file into a binary, the entry point address is very close to 0x400000.

The entry point address in the ELF metadata is this value, plus the offset from the beginning of the .text section to the _start symbol. Note also that the _start symbol can be configured. Again from my default linker script example: ENTRY(_start).

like image 43
andrewrk Avatar answered Oct 20 '22 03:10

andrewrk