I've been reading the ELF specification and cannot figure out where the program entry point and _start address come from.
It seems like they should have to be in a pretty consistent place, but I made a few trivial programs, and _start is always in a different place.
Can anyone clarify?
you can use readelf and objdump to read parts of an elf file. You can also use 'hexdump filename' to get a hexdump of the contents of a binary file (this is likely only useful if you like reading machine code or you are writing an assembler).
An ELF file consists of zero or more segments, and describe how to create a process/memory image for runtime execution. When the kernel sees these segments, it uses them to map them into virtual address space, using the mmap(2) system call. In other words, it converts predefined instructions into a memory image.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary.
The ELF file is built for an x86-64 bit machine. There are two important pieces of information present in the ELF header. One is the ELF program header part and the other is the ELF section header part. When a program is compiled, different things are generated after compilation.
The _start
symbol may be defined in any object file. Normally it is generated automatically (it corresponds to main
in C). You can generate it yourself, for instance in an assembler source file:
.globl _start
_start:
// assembly here
When the linker has processed all object files it looks for the _start
symbol and puts its value in the e_entry
field of the elf header. The loader takes the address from this field and makes a call to it after it has finished loading all sections in memory and is ready to execute the file.
Take a look at the linker script ld
is using:
ld -verbose
The format is documented at: https://sourceware.org/binutils/docs-2.25/ld/Scripts.html
It determines basically everything about how the executable will be generated.
On Binutils 2.24 Ubuntu 14.04 64-bit, it contains the line:
ENTRY(_start)
which sets the entry point to the _start
symbol (goes to the ELF header as mentioned by ctn)
And then:
. = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
which sets the address of the first headers to 0x400000
+ SIZEOF_HEADERS
.
I have modified that address to 0x800000
, passed my custom script with ld -T
and it worked: readelf -s
says that _start
is at that address.
Another way to change it is to use the -Ttext-segment=0x800000
option.
The reason for using 0x400000
= 4Mb = getconf PAGE_SIZE
is to start at the beginning of the second page as asked at: Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?
A question describes how to set _start
from the command line: Why is the ELF entry point 0x8048000 not changeable with the "ld -e" option?
SIZEOF_HEADERS
is the size of the ELF + program headers, which are at the beginning of the ELF file. That data gets loaded into the very beginning of the virtual memory space by Linux (TODO why?) In a minimal Linux x86-64 hello world with 2 program headers it is worth 0xb0
, so that the _start
symbol comes at 0x4000b0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With