My hello & regards to all. I have a C program, basically wrote for testing Buffer overflow.
#include<stdio.h>
void display()
{
char buff[8];
gets(buff);
puts(buff);
}
main()
{
display();
return(0);
}
Now i disassemble display and main sections of it using GDB. The code:-
Dump of assembler code for function main:
0x080484ae <+0>: push %ebp # saving ebp to stack
0x080484af <+1>: mov %esp,%ebp # saving esp in ebp
0x080484b1 <+3>: call 0x8048474 <display> # calling display function
0x080484b6 <+8>: mov $0x0,%eax # move 0 into eax , but WHY ????
0x080484bb <+13>: pop %ebp # remove ebp from stack
0x080484bc <+14>: ret # return
End of assembler dump.
Dump of assembler code for function display:
0x08048474 <+0>: push %ebp #saves ebp to stack
0x08048475 <+1>: mov %esp,%ebp # saves esp to ebp
0x08048477 <+3>: sub $0x10,%esp # making 16 bytes space in stack
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
0x08048480 <+12>: mov %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
0x08048485 <+17>: lea -0xc(%ebp),%eax #Load effective address of 12 bytes
lower placed value ( WHY???? )
0x08048488 <+20>: mov %eax,(%esp) #make esp point to the address inside of eax
0x0804848b <+23>: call 0x8048374 <gets@plt> # calling get, what is "@plt" ????
0x08048490 <+28>: lea -0xc(%ebp),%eax # LEA of 12 bytes lower to eax
0x08048493 <+31>: mov %eax,(%esp) # make esp point to eax contained address
0x08048496 <+34>: call 0x80483a4 <puts@plt> # again what is "@plt" ????
0x0804849b <+39>: mov -0x4(%ebp),%eax # move (ebp - 4) location's contents to eax
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail@plt> # not known to me
0x080484ac <+56>: leave # a new instruction, not known to me
0x080484ad <+57>: ret # return to MAIN's next instruction
End of assembler dump.
So folks, you should consider my homework. Rest all of the code is known to me, except few lines. I have included a big "WHY ????" and some more questions in the comments ahead of each line. The first hurdle for me is "mov %gs:0x14,%eax" instruction, I cant make flow chart after this instruction. Somebody plz explain me, what these few instructions are meant for and doing what in the program? Thanks...
In the AT&T style assembly languages, the percent sigil generally indicates a register. In x86 family processors from 386 onwards, GS is one of the so-called segment registers. However, in protected mode environments segment registers work as selector registers.
mov [eax], [ebx] is attempting to move one memory location (dereferenced from the value in the ebx register) to the memory location referred to in eax .
0x080484b6 <+8>: mov $0x0,%eax # move 0 into eax , but WHY ????
Don't you have this?:
return(0);
They are probably related. :)
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
It means reading 4 bytes into eax from memory at address gs:0x14. gs is a segment register. Most likely thread-local storage (AKA TLS
) is referenced through this register.
0x08048483 <+15>: xor %eax,%eax # xor eax with itself (but WHY??)
Don't know. Could be optimization-related.
0x08048485 <+17>: lea -0xc(%ebp),%eax #Load effective address of 12 bytes
lower placed value ( WHY???? )
It makes eax point to a local variable that lives on the stack. sub $0x10,%esp
allocated some space for them.
0x08048488 <+20>: mov %eax,(%esp) #make esp point to the address inside of eax
Wrong. It writes eax to the stack, to the stack top. It will be passed as an on-stack argument to the called function:
0x0804848b <+23>: call 0x8048374 <gets@plt> # calling get, what is "@plt" ????
I don't know. Could be some name mangling.
By now you should've guessed what local variable that was. buff
, what else could it be?
0x080484ac <+56>: leave # a new instruction, not known to me
Why don't you look it up in the CPU manual?
Now, I can probably explain you the gs/TLS thing...
0x08048474 <+0>: push %ebp #saves ebp to stack
0x08048475 <+1>: mov %esp,%ebp # saves esp to ebp
0x08048477 <+3>: sub $0x10,%esp # making 16 bytes space in stack
0x0804847a <+6>: mov %gs:0x14,%eax # what does it mean ????
0x08048480 <+12>: mov %eax,-0x4(%ebp) # move eax contents to 4 bytes lower in stack
...
0x0804849b <+39>: mov -0x4(%ebp),%eax # move (ebp - 4) location's contents to eax
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ????
0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me
0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail@plt> # not known to me
0x080484ac <+56>
So, this code takes a value from the TLS (at gs:0x14) and stores it right below the saved ebp value (at ebp-4). Then there's your stuff with get()
and put()
. Then this code checks whether the copy of the value from the TLS is unchanged. xor %gs:0x14,%eax
does the compare.
If XORed values are the same, the result of the XOR is 0 and flags.zf is 1. Else, the result isn't 0 and flags.zf is 0.
je 0x80484ac <display+56>
checks flags.zf and skips call 0x8048394 <__stack_chk_fail@plt>
if flags.zf = 1. IOW, this call is skipped if the copy of the value from the TLS is unchanged.
What is that all about? That's a way to try to catch a buffer overflow. If you write beyond the end of the buffer, you will overwrite that value copied from the TLS to the stack.
Why do we take this value from the TLS, why not just a constant, hard-coded value? We probably want to use different, non-hard-coded values to catch overflows more often (and so the value in the TLS will change from a run to another run of your program and it will be different in different threads of your program). That also lowers chances of successfully exploiting the buffer overflow by an attacker if the value is chosen randomly each time your program runs.
Finally, if the copy of the value is found to have been overwritten due to a buffer overflow, call 0x8048394 <__stack_chk_fail@plt>
will call a special function dedicated to doing whatever's necessary, e.g. reporting a problem and terminating the program.
0x0804849e <+42>: xor %gs:0x14,%eax # # again what is this ???? 0x080484a5 <+49>: je 0x80484ac <display+56> # Not known to me 0x080484a7 <+51>: call 0x8048394 <__stack_chk_fail@plt> # not known to me 0x080484ac <+56>: leave # a new instruction, not known to me 0x080484ad <+57>: ret # return to MAIN's next instruction
The gs segment can be used for thread local storage. E.g. it's used for errno
, so that each thread in a multi-threaded program effectively has its own errno variable.
The function name above is a big clue. This must be a stack canary.
(leave
is some CISC instruction that does everything you need to do before the actual ret. I don't know the details).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With