This my program:
void test_function(int a, int b, int c, int d){
int flag;
char buffer[10];
flag = 31337;
buffer[0] = 'A';
}
int main() {
test_function(1, 2, 3, 4);
}
I compile this program with the debug option:
gcc -g my_program.c
I use gdb and I disassemble the test_function with intel syntax:
(gdb) disassemble test_function
Dump of assembler code for function test_function:
0x08048344 <test_function+0>: push ebp
0x08048345 <test_function+1>: mov ebp,esp
0x08048347 <test_function+3>: sub esp,0x28
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41
0x08048355 <test_function+17>: leave
0x08048356 <test_function+18>: ret
End of assembler dump.
And I disassemble the main:
(gdb) disassemble main
Dump of assembler code for function main:
0x08048357 <main+0>: push ebp
0x08048358 <main+1>: mov ebp,esp
0x0804835a <main+3>: sub esp,0x18
0x0804835d <main+6>: and esp,0xfffffff0
0x08048360 <main+9>: mov eax,0x0
0x08048365 <main+14>: sub esp,eax
0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4
0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3
0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2
0x0804837f <main+40>: mov DWORD PTR [esp],0x1
0x08048386 <main+47>: call 0x8048344 <test_function>
0x0804838b <main+52>: leave
0x0804838c <main+53>: ret
End of assembler dump.
I place a breakpoint at this adresse: 0x08048355 (leave instruction for the test_function) and I run the program.
I watch the stack like this:
(gdb) x/16w $esp
0xbffff7d0: 0x00000041 0x08049548 0xbffff7e8 0x08048249
0xbffff7e0: 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x00007a69
0xbffff7f0: 0xb7fd6ff4 0xbffff8ac 0xbffff818 0x0804838b
0xbffff800: 0x00000001 0x00000002 0x00000003 0x00000004
0x0804838b is the return adress, 0xbffff818 is the saved frame pointer (main ebp) and flag variable is stocked 12 bytes further. Why 12?
I don't understand this instruction:
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
Why we don't stock the content's variable 0x00007a69 in ebp-4 instead of 0xbffff8ac?
Same question for buffer. Why 40?
We don't waste the memory? 0xb7fd6ff4 0xbffff8ac and 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x08049548 0xbffff7e8 0x08048249 are not used?
This the output for the command gcc -Q -v -g my_program.c
:
Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/specs
Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext --enable-clocale=gnu --enable-debug i486-linux-gnu
Thread model: posix
gcc version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1)
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/cc1 -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6 notesearch.c -dumpbase notesearch.c -auxbase notesearch -g -version -o /tmp/ccGT0kTf.s
GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1) (i486-linux-gnu)
compiled by GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1).
GGC heuristics: --param ggc-min-expand=99 --param ggc-min-heapsize=129473
options passed: -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6
-auxbase -g
options enabled: -fpeephole -ffunction-cse -fkeep-static-consts
-fpcc-struct-return -fgcse-lm -fgcse-sm -fsched-interblock -fsched-spec
-fbranch-count-reg -fcommon -fgnu-linker -fargument-alias
-fzero-initialized-in-bss -fident -fmath-errno -ftrapping-math -m80387
-mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387
-maccumulate-outgoing-args -mcpu=pentiumpro -march=i486
ignoring nonexistent directory "/usr/local/include/i486-linux-gnu"
ignoring nonexistent directory "/usr/i486-linux-gnu/include"
ignoring nonexistent directory "/usr/include/i486-linux-gnu"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/include
/usr/include
End of search list.
gnu_dev_major gnu_dev_minor gnu_dev_makedev stat lstat fstat mknod fatal ec_malloc dump main print_notes find_user_note search_note
Execution times (seconds)
preprocessing : 0.00 ( 0%) usr 0.01 (25%) sys 0.00 ( 0%) wall
lexical analysis : 0.00 ( 0%) usr 0.01 (25%) sys 0.00 ( 0%) wall
parser : 0.02 (100%) usr 0.01 (25%) sys 0.00 ( 0%) wall
TOTAL : 0.02 0.04 0.00
as -V -Qy -o /tmp/ccugTYeu.o /tmp/ccGT0kTf.s
GNU assembler version 2.17.50 (i486-linux-gnu) using BFD version 2.17.50 20070103 Ubuntu
/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crt1.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crti.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtbegin.o -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6 -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../.. /tmp/ccugTYeu.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtend.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crtn.o
NOTE: I read the book "The art of exploitation" and I use the VM provides with the book.
Solution. The allocation of Local Variables occurs when the calling VI is loaded into memory. If it is a stand-alone VI, then the memory for the Local Variable is allocated at run-time and deallocated at the end of its run.
Local Variable: These variables are declared within a method but do not get any default value. They are usually created when we enter a method or constructor and are destroyed after exiting the block or when the call returns from the method.
Each static or global variable defines one block of space, of a fixed size. The space is allocated once, when your program is started (part of the exec operation), and is never freed. Automatic allocation happens when you declare an automatic variable, such as a function argument or a local variable.
The stack is used for dynamic memory allocation, and local variables are stored at the top of the stack in a stack frame. A frame pointer is used to refer to local variables in the stack frame.
The compiler is trying to maintain 16 byte alignment on the stack. This also applies to 32-bit code these days (not just 64-bit). The idea is that at the point before executing a CALL instruction the stack must be aligned to a 16-byte boundary.
Because you compiled with no optimizations there are some extraneous instructions.
0x0804835a <main+3>: sub esp,0x18 ; Allocate local stack space
0x0804835d <main+6>: and esp,0xfffffff0 ; Ensure `main` has a 16 byte aligned stack
0x08048360 <main+9>: mov eax,0x0 ; Extraneous, not needed
0x08048365 <main+14>: sub esp,eax ; Extraneous, not needed
ESP is now 16-byte aligned after the last instruction above. We move the parameters for the call starting at the top of the stack at ESP. That is done with:
0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4
0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3
0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2
0x0804837f <main+40>: mov DWORD PTR [esp],0x1
The CALL then pushes a 4 byte return address on the stack. We then reach these instructions after the call:
0x08048344 <test_function+0>: push ebp ; 4 bytes pushed on stack
0x08048345 <test_function+1>: mov ebp,esp ; Setup stackframe
This pushes another 4 bytes on the stack. With the 4 bytes from the return address we are now misaligned by 8 bytes. To reach 16-byte alignment again we will need to waste an additional 8 bytes on the stack. That is why in this statement there is an additional 8 bytes allocated:
0x08048347 <test_function+3>: sub esp,0x28
The second and third number above added together is the value 0x28 computed by the compiler and used in sub esp,0x28
.
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
So why [ebp-12]
in this instruction? The first 8 bytes [ebp-8]
through [ebp-1]
are the alignment bytes used to get the stack 16-byte aligned. The local data will then appear on the stack after that. In this case [ebp-12]
through [ebp-9]
are the 4 bytes for the 32-bit integer flag
.
Then we have this for updating buffer[0]
with the character 'A':
0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41
The oddity then would be why a 10 byte array of characters would appear from [ebp+40]
(beginning of array) to [ebp+13]
which is 28 bytes. The best guess I can make is that compiler felt that it could treat the 10 byte character array as a 128-bit (16-byte) vector. This would force the compiler to align the buffer on a 16 byte boundary, and pad the array out to 16 bytes (128-bits). From the perspective of the compiler, your code seems to be acting much like it was defined as:
#include <xmmintrin.h>
void test_function(int a, int b, int c, int d){
int flag;
union {
char buffer[10];
__m128 m128buffer; ; 16-byte variable that needs to be 16-bytes aligned
} bufu;
flag = 31337;
bufu.buffer[0] = 'A';
}
The output on GodBolt for GCC 4.9.0 generating 32-bit code with SSE2 enabled appears as follows:
test_function:
push ebp #
mov ebp, esp #,
sub esp, 40 #,same as: sub esp,0x28
mov DWORD PTR [ebp-12], 31337 # flag,
mov BYTE PTR [ebp-40], 65 # bufu.buffer,
leave
ret
This looks very similar to your disassembly in GDB.
If you compiled with optimizations (such as -O1
, -O2
, -O3
), the optimizer could have simplified test_function
because it is a leaf function in your example. A leaf function is one that doesn't call another function. Certain shortcuts could have been applied by the compiler.
As for why the character array seems to be aligned to a 16-byte boundary and padded to be 16 bytes? That probably can't be answered with certainty until we know what GCC compiler you are using (gcc --version
will tell you). It would also be useful to know your OS and OS version. Even better would be to add the output from this command to your question gcc -Q -v -g my_program.c
Unless you're trying to improve gcc's code itself, understanding why un-optimized code is as bad as it is will mostly be a waste of time. Look at output from -O3
if you want to see what a compiler does with your code, or from -Og
if you want to see a more literal translation of your source into asm. Write functions that take input in args and produce output in globals or return values, so the optimized asm isn't just ret
.
You shouldn't expect anything efficient from gcc -O0
. It makes the most braindead literal translation of your source.
I can't reproduce that asm output with any gcc or clang version on http://gcc.godbolt.org/. (gcc 4.4.7 to gcc 5.3.0, clang 3.0 to clang 3.7.1). (Note that godbolt use g++
, but you can use -x c
to treat the input as C, instead of compiling it as C++. This can sometimes change the asm output, even when you don't use any features C99 / C11 has but C++ doesn't. (e.g. C99 variable-length arrays).
Some versions of gcc default to emitting extra code unless I use -fno-stack-protector
.
I thought at first that the extra space reserved by test_function
was to copy its args down into its stack frame, but at least modern gcc doesn't do this. (64bit gcc does store its args into memory when they arrive in registers, but that's different. 32bit gcc will increment an arg in place on the stack, without copying it.)
The ABI does allow the called function to clobber its args on the stack, so a caller that wanted to make repeated function calls with the same args would have to keep storing them between calls.
clang 3.7.1 with -O0
does copy its args down into locals, but that still only reserves 32 (0x20
) bytes.
This is about the best answer you're going to get unless you tell us which version of gcc you're using...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With