Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do Linux program .text sections start at 0x0804800 and stack tops start at 0xbffffff?

According to Assembly Primer For Hackers (Part 2) Virtual Memory Organization, Linux program .text sections start at 0x0804800 and stack tops start at 0xbffffff. What is the significance of these numbers? Why not start .text at 0x0000000 (or 0x0000020 or 0x0000040 to go the next 32 or 64 bits past NULL)? Why not start the top of the stack at 0xfffffff?

like image 923
Heath Borders Avatar asked Feb 10 '13 05:02

Heath Borders


2 Answers

There's not much of significance.

The stack typically grows downwards (to the lower addresses) and so it's somewhat reasonable (but not mandatory) to place it at high addresses and have some room for its expansion towards the lower addresses.

As for not using address 0 for program sections, there's some logic here. First, a lot of software uses 0 for NULL, a legal invalid pointer in C and C++, which should not be dereferenced. A lot of software has bugs in that it actually attempts to read or write memory at address 0 without proper pointer validation. If you make the memory area around address 0 inaccessible to the program, you can spot some of these bugs (the program will crash or stop in the debugger). Also, since NULL is a legal invalid pointer, there should be no data or code at that address (if there is, you are unable to distinguish a pointer to it from NULL).

On the x86 platform the memory around address 0 is typically made inaccessible by means of virtual to physical address translation. The page tables get set up in such a way that the entry for virtual address 0 is not backed up by a page of physical memory, and a page is usually 4 KB in size and not just a handful of bytes. That's why if you take out address 0, you take out addresses 1 through 4095 as well. It's also reasonable to take out more than 4 KB of the address space at address 0. The reason for that is pointers to structures in C and C++. You can have a NULL pointer to a structure and when you dereference it, the attempted memory access occurs at the address contained in the pointer (0) plus the distance between the structure member you're trying to access and the beginning of the structure (0 for the first member, greater than 0 for the rest).

There may be some other considerations for choosing specific ranges of addresses for programs, but I cannot speak for all of them. The OS may want to keep some program-related stuff (data structures) within the program itself, so why not use a fixed location for that near one of the ends of the accessible portion of the address space?

like image 26
Alexey Frunze Avatar answered Oct 06 '22 01:10

Alexey Frunze


Let's start by saying this: most of the time, the various sections do not need to be placed in a specific location, what matters more is the layout. Nowadays, the stack top is actually randomised, see here.

0x08048000 is the default address on which ld starts the first PT_LOAD segment on Linux/x86. On Linux/amd64 the default is 0x400000 and you can change the default by using a custom linker script. You can also change where .text section starts with the -Wl,-Ttext,0xNNNNNNNN flag to gcc. To understand why .text is not mapped at address 0, keep in mind that the NULL pointer is usually mapped to ((void *) 0) for convenience. It is useful, then, that the zero page is mapped inaccessible to trap uses of NULL pointers. The memory before the start of .text is actually used by a lot of things; take cat /proc/self/maps as an example:

$ cat /proc/self/maps 
001c0000-00317000 r-xp 00000000 08:01 245836     /lib/libc-2.12.1.so
00317000-00318000 ---p 00157000 08:01 245836     /lib/libc-2.12.1.so
00318000-0031a000 r--p 00157000 08:01 245836     /lib/libc-2.12.1.so
0031a000-0031b000 rw-p 00159000 08:01 245836     /lib/libc-2.12.1.so
0031b000-0031e000 rw-p 00000000 00:00 0 
00376000-00377000 r-xp 00000000 00:00 0          [vdso]
00852000-0086e000 r-xp 00000000 08:01 245783     /lib/ld-2.12.1.so
0086e000-0086f000 r--p 0001b000 08:01 245783     /lib/ld-2.12.1.so
0086f000-00870000 rw-p 0001c000 08:01 245783     /lib/ld-2.12.1.so
08048000-08051000 r-xp 00000000 08:01 2244617    /bin/cat
08051000-08052000 r--p 00008000 08:01 2244617    /bin/cat
08052000-08053000 rw-p 00009000 08:01 2244617    /bin/cat
09ab5000-09ad6000 rw-p 00000000 00:00 0          [heap]
b7502000-b7702000 r--p 00000000 08:01 4456455    /usr/lib/locale/locale-archive
b7702000-b7703000 rw-p 00000000 00:00 0 
b771b000-b771c000 r--p 002a1000 08:01 4456455    /usr/lib/locale/locale-archive
b771c000-b771e000 rw-p 00000000 00:00 0 
bfbd9000-bfbfa000 rw-p 00000000 00:00 0          [stack]

What we see here is the C library, the dynamic loader ld.so and the kernel VDSO (kernel mapped dynamic code library that provides some interfaces to the kernel). Note that the start of the heap is also randomised.

like image 113
Michael Foukarakis Avatar answered Oct 06 '22 01:10

Michael Foukarakis