Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In an ELF file, how does the address for _start get detemined?

Tags:

symbols

elf

I've been reading the ELF specification and cannot figure out where the program entry point and _start address come from.

It seems like they should have to be in a pretty consistent place, but I made a few trivial programs, and _start is always in a different place.

Can anyone clarify?

like image 531
samoz Avatar asked Nov 24 '10 22:11

samoz


People also ask

How do ELF files look information?

you can use readelf and objdump to read parts of an elf file. You can also use 'hexdump filename' to get a hexdump of the contents of a binary file (this is likely only useful if you like reading machine code or you are writing an assembler).

How does ELF file work?

An ELF file consists of zero or more segments, and describe how to create a process/memory image for runtime execution. When the kernel sees these segments, it uses them to map them into virtual address space, using the mmap(2) system call. In other words, it converts predefined instructions into a memory image.

What does an ELF file contains?

An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary.

How are ELF files generated?

The ELF file is built for an x86-64 bit machine. There are two important pieces of information present in the ELF header. One is the ELF program header part and the other is the ELF section header part. When a program is compiled, different things are generated after compilation.


2 Answers

The _start symbol may be defined in any object file. Normally it is generated automatically (it corresponds to main in C). You can generate it yourself, for instance in an assembler source file:

.globl _start
_start:
    // assembly here

When the linker has processed all object files it looks for the _start symbol and puts its value in the e_entry field of the elf header. The loader takes the address from this field and makes a call to it after it has finished loading all sections in memory and is ready to execute the file.

like image 96
ctn Avatar answered Oct 10 '22 04:10

ctn


Take a look at the linker script ld is using:

ld -verbose

The format is documented at: https://sourceware.org/binutils/docs-2.25/ld/Scripts.html

It determines basically everything about how the executable will be generated.

On Binutils 2.24 Ubuntu 14.04 64-bit, it contains the line:

ENTRY(_start)

which sets the entry point to the _start symbol (goes to the ELF header as mentioned by ctn)

And then:

. = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;

which sets the address of the first headers to 0x400000 + SIZEOF_HEADERS.

I have modified that address to 0x800000, passed my custom script with ld -T and it worked: readelf -s says that _start is at that address.

Another way to change it is to use the -Ttext-segment=0x800000 option.

The reason for using 0x400000 = 4Mb = getconf PAGE_SIZE is to start at the beginning of the second page as asked at: Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?

A question describes how to set _start from the command line: Why is the ELF entry point 0x8048000 not changeable with the "ld -e" option?

SIZEOF_HEADERS is the size of the ELF + program headers, which are at the beginning of the ELF file. That data gets loaded into the very beginning of the virtual memory space by Linux (TODO why?) In a minimal Linux x86-64 hello world with 2 program headers it is worth 0xb0, so that the _start symbol comes at 0x4000b0.