Consider The following minimal C program:
Case Number 1:
#include <stdio.h>
#include <string.h>
void foo(char* s)
{
char buffer[10];
strcpy(buffer,s);
}
int main(void)
{
foo("01234567890134567");
}
This doesn't cause a crash dump
If add just one character, so the new main is:
Case Number 2:
void main()
{
foo("012345678901345678");
^
}
The program crashes with a Segmentation fault.
Looks like additionally to the 10 characters reserved in the stack there's an additional room for 8 additional characters. Thus the first program doesn't crash. However, if you add one more character you start accessing invalid memory. My questions are:
An other doubt I have in this case is how does the OS (Windows in this case) detects the bad memory access? Normally as per the Windows documentation the default stack size is 1MB Stack Size. So I don't see how the OS detects that the address being accessed is outside the process memory specially when the minimum page size is normally 4k. Does the OS use the SP in this case to check the address?
PD: I'm using the following environment for the testing
Cygwin
GCC 4.8.3
Windows 7 OS
EDIT:
This is the generated assembly from http://gcc.godbolt.org/# but using GCC 4.8.2, I can't see the GCC 4.8.3 in the available compilers. But I guess the generated code should be similar. I built the code without any flags. I hope somebody with Assembly expertise could shed some light about what's happening in the foo function and why the extra char causes the seg fault
foo(char*):
pushq %rbp
movq %rsp, %rbp
subq $48, %rsp
movq %rdi, -40(%rbp)
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movq -40(%rbp), %rdx
leaq -32(%rbp), %rax
movq %rdx, %rsi
movq %rax, %rdi
call strcpy
movq -8(%rbp), %rax
xorq %fs:40, %rax
je .L2
call __stack_chk_fail
.L2:
leave
ret
.LC0:
.string "01234567890134567"
main:
pushq %rbp
movq %rsp, %rbp
movl $.LC0, %edi
call foo(char*)
movl $0, %eax
popq %rbp
ret
I believe you understand that you have implemented something that leads to Undefined Behavior. So it is hard to answer why it fails with the extra string and doesn't with the original. It is probably related to the internal compiler implementation + affected by the compilation flags (like alignments, optimizations, etc.).
You can try disassembling the binary or creating assembly code and seeing where exactly the buffer is put on the stack. You can do the same with different optimization levels to inspect the changes in the assembly code and the behavior.
how does the OS (Windows in this case) detects the bad memory access? Normally as per the Windows documentation the default stack size is 1MB Stack Size. So I don't see how the OS detects that the address being accessed is outside the process memory specially when the minimum page size is normally 4k. Does the OS use the SP in this case to check the address?
The OS doesn't monitor the code you execute. The HW (CPU) does (since it executes this code). Once your code tries to access an address which was not allocated for your process (was not mapped by the OS for your program) the OS will get an indication since the HW will fire a #PF (page fault) exception. Another case is that you try to access an address which was allocated for you but with improper permissions (for example you try to execute binary data from a DATA page which has no 'execute' permission) or go to the CODE page but with a wrong offset and the instruction that you read doesn't exist or (even worse) it exists and decodes to something you don't expect (did we say Undefined Behavior before?).
In general your code most likely doesn't fail on strcpy
(it can if you write enough data to access some forbidden addresses but most likely it is not the case) - it fails when it returns from the foo
function. strcpy
just overwrote the next instruction pointer which points to the next instruction after the foo
function. So the instruction pointer is filled with the data from the "012345678901345678" string and tries to fetch the next instruction from the 'junky' address and fails due to the mentioned above reasons.
This "method"/bug is called a "buffer overflow attack" and widely used among hackers to make your code (and more often OS/BIOS/VMM/SMM code which is executed with higher privileges) execute malicious code provided by the hacker. Just make sure to overwrite the instruction pointer with the address of the code that you prepared in advance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With