Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Help with C Pointers

Tags:

c

I have a query with regards to pointers, can someone help explain the following to me?

I do understand how the pointers work, however, I ain't too sure as to how overwriting parts of memory from addresses modify the behavior of the program.

I will explain the following as much as I can according to what I understand, feel free to critic and enlighten me on my misunderstandings, heres the code chunk:

void f(int) ;
int main ( int argc, char ** argv ) {
    int a = 1234 ;
    f(a);
    printf("Back to main\n") ;
}
void g() {
    printf("Inside g\n") ;
}
void f (int x) {
    int a[100] ;
    memcpy((char*)a,(char*)g,399) ;
    x = *(&x-1) ;
    *(&x-1) = (int)(&a) ; // note the cast; no cast -> error
    // find an index for a such that a[your_index] is the same as x
    printf("About to return from f\n") ;
}

//This program, compiled with the same compiler as above, produces the following output:

//About to return from f
//Inside g
//Back to main

Ok from what I understand, this is how it goes.

The program begin procedurally frorm main(), it assigns a, then goes into f() with a as variable.

Inside f():

It inits an array a of size 100. Then copies the memory space from g() to the entire a array. So now essentially a[] is g(). x is then assigned to the address of the original a from main() - 1, which I would assume is the address of main(). (I am not sure about this, correct me if I'm wrong)

From here onwards, I ain't too sure how it manages to call a[] (the one that is overwritted with g()) or even g(). It just seems to end f() and go back to main().

Thanks to whoever can help me out with this!

Cheers!

like image 930
nubela Avatar asked Sep 03 '09 09:09

nubela


1 Answers

Technically, that code goes far outside what the C standard defines, so it could do anything. It's making a huge number of assumptions that it has no right to, and those assumptions are certainly not universally true. However, I can put forward a highly likely explanation for why you see the output you do:

You're correct up to the point where you have copied the code of the function g() into the memory occupied by the local array variable a.

To understand the next line, you need a to know a little about how functions tend to be called on common stack-based architectures. When a function is called, the parameters are pushed onto the stack, then the return address is pushed onto the stack, and execution jumps to the start point of the function. Within the function, the previous frame pointer is pushed onto the stack, then room is made for the local variables. Stacks tend to grow downwards in memory (from high addresses to low addresses), although this is not the case on all common architectures.

So, when main calls into function f(), the stack initially looks like this (the frame pointer and stack pointer are two CPU registers containing addresses of locations on the stack):

                    | ...                     | (higher addresses)
                    | char **argv (parameter) |
                    |-------------------------|
                    | int argc (parameter)    |
                    |-------------------------|
FRAME POINTER ->    | saved frame pointer     |
                    |-------------------------|
                    | int a                   |
                    |-------------------------|
                    | int x (parameter)       | &x
                    |-------------------------|
STACK POINTER ->    | return address          | &x - 1
                    |-------------------------|
                    | ...                     | (lower addresses)

The function prologue then saves the calling function's frame pointer and moves the stack pointer to create space for the local variables in f(). So when the C code in f() starts executing, the stack now looks something like this:

                    | ...                     | (higher addresses)
                    | char **argv (parameter) |
                    |-------------------------|
                    | int argc (parameter)    |
                    |-------------------------|
                    | saved frame pointer     |
                    |-------------------------|
                    | int a                   |
                    |-------------------------|
                    | int x (parameter)       | &x
                    |-------------------------|
                    | return address          | &x - 1
                    |-------------------------|
FRAME POINTER ->    | saved frame pointer     | 
                    |-------------------------|
                    | a[99]                   | &a[99]
                    | a[98]                   | &a[98]
                    | ...                     | ...
STACK POINTER ->    | a[0]                    | &a[0]
                    | ...                     | (lower addresses)

What is the frame pointer? It's used to reference local variables and parameters within a function. The compiler knows that when f() is executing, the address of local variable a is always FRAME_POINTER - 100 * sizeof(int), and the address of parameter x is FRAME_POINTER + sizeof(FRAME_POINTER) + sizeof(RETURN_ADDRESS). All local variables and parameters can be accessed as a fixed offset from the frame pointer, no matter how the stack pointer moves around as stack space is allocated and deallocated.

Anyway, back to the code. When this line executes:

x = *(&x-1) ;

It copies the value that is stored 1 integer-size lower in memory than x, into x. If you look at my ASCII-art, you'll see that that's the return address. So that's actually performing this:

x = RETURN_ADDRESS;

The following line:

*(&x-1) = (int)(&a) ;

Then sets the return address to the address of the array a. It's really saying:

RETURN_ADDRESS = &a;

The cast is required because you're treating the return address as an int, and not a pointer (so in fact, this code will only work on architectures where int is the same size as a pointer - this will NOT work on 64 bit POSIX systems, for example!).

The C code in function f() now completes, and the function epilogue un-allocates the local variables (by moving the stack pointer back) and restores the caller's frame pointer. At this point, the stack looks like:

                    | ...                     | (higher addresses)
                    | char **argv (parameter) |
                    |-------------------------|
                    | int argc (parameter)    |
                    |-------------------------|
FRAME POINTER ->    | saved frame pointer     |
                    |-------------------------|
                    | int a                   |
                    |-------------------------|
                    | int x (parameter)       | &x
                    |-------------------------|
STACK POINTER ->    | return address          | &x - 1
                    |-------------------------|
                    | saved frame pointer     | 
                    |-------------------------|
                    | a[99]                   | &a[99]
                    | a[98]                   | &a[98]
                    | ...                     | ...
                    | a[0]                    | &a[0]
                    | ...                     | (lower addresses)

Now the function returns by jumping to the value of RETURN_ADDRESS - but we set that to &a, so instead of going back to where it was called from, it jumps to the value of start of the array a - it's now executing code from the stack. This is where you copied the code from function g(), so that code (apparently) happily runs. Note that because the stack pointer has been moved back above the array here, any asynchronous code that is executed with the same stack (like a UNIX signal that arrives at the wrong moment) will overwrite the code!

So here's what the stack now looks like at the start of g(), before the function prologue:

                    | ...                     | (higher addresses)
                    | char **argv (parameter) |
                    |-------------------------|
                    | int argc (parameter)    |
                    |-------------------------|
FRAME POINTER ->    | saved frame pointer     |
                    |-------------------------|
                    | int a                   |
                    |-------------------------|
STACK POINTER ->    | int x (parameter)       | 
                    |-------------------------|
                    | return address          | 
                    |-------------------------|
                    | saved frame pointer     | 
                    |-------------------------|
                    | a[99]                   | 
                    | a[98]                   | 
                    | ...                     | 
                    | a[0]                    | 
                    | ...                     | (lower addresses)

The prologue for g() then sets up a stack frame as normal, executes it, and unwinds it, which leaves the frame pointer and stack pointer as in the last diagram above.

Now g() returns, so it looks for a return value at the top of the stack - but the top of the stack (where the stack pointer is pointing) is actually the location where the parameter x to function f() lived - and this is where we stashed the original return value earlier, so it returns to the place where f() was called from.

As a side note, the stack is now desynchronised in main(), because it expected the stack pointer to be where it was when it called f() (which is pointing to the spot where the parameter x was stored) - but now it's actually pointing at the local variable a. This would cause some weird effects - if you called another function from main at this point, the contents of a would be altered!

I hope you (and others) have learned something valuable from this discussion, but it's important to remember that this is like the Five Point Palm Exploding Heart Technique of programming - NEVER use it in a real system. A new sub-architecture, compiler or even just different compiler flags can and will change the execution environment enough to make this kind of too-clever-by-half code fail completely in all sorts of delightful and amusing ways.

like image 90
caf Avatar answered Oct 12 '22 10:10

caf