Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can we change the base address of an array through a pointer to array using brute force?

Tags:

arrays

c

pointers

Somebody wrote the following C program and asked why gcc allows "to change the base address of an array". He was aware that the code is terrible but still wanted to know. I found the question interesting enough, because the relationship between arrays and pointers in C is subtle (behold the use of the address operator on the array! "Why would anybody do that?"), confusing and consequently often misunderstood. The question was deleted but I thought I ask it again, with proper context and -- as I hope -- a proper answer to go with it. Here is the original prog.

static char* abc = "kj";

void fn(char**s)
{
   *s = abc;
}

int main() 
{
   char str[256];
   fn(&str);
}

It compiles with gcc (with warnings), links and runs. What happens here? Can we change the base address of an array by taking its address, casting it to pointer to pointer (after all arrays are almost pointers in C, aren't they) and assigning to it?

like image 508
Peter - Reinstate Monica Avatar asked Sep 04 '14 11:09

Peter - Reinstate Monica


3 Answers

It cannot work (even theoretically), because arrays are not pointers:


  • int arr[10]:

    • Amount of memory used is sizeof(int)*10 bytes

    • The values of arr and &arr are necessarily identical

    • arr points to a valid memory address, but cannot be set to point to another memory address


  • int* ptr = malloc(sizeof(int)*10):

    • Amount of memory used is sizeof(int*) + sizeof(int)*10 bytes

    • The values of ptr and &ptr are not necessarily identical (in fact, they are mostly different)

    • ptr can be set to point to both valid and invalid memory addresses, as many times as you will

like image 182
barak manos Avatar answered Nov 15 '22 09:11

barak manos


The program doesn't change the "base address" of the array. It's not even trying to.

What you pass to fn is the address of a chunk of 256 characters in memory. It is numerically identical to the pointer which str would decay to in other expressions, only differently typed. Here, the array really stays an array -- applying the address operator to an array is one of the instances where an array does not decay to a pointer. Incrementing &str, for example, would increase it numerically by 256. This is important for multi dimensional arrays which, as we know, actually are one-dimensional arrays of arrays in C. Incrementing the first index of a "2-dimensional" array must advance the address to the start of the next "chunk" or "row".

Now the catch. As far as fn is concerned, the address you pass points to a location which contains another address. That is not true; it points to a sequence of characters. Printing that byte sequence interpreted as a pointer reveals the 'A' byte values, 65 or 0x41.

fn, however, thinking that the memory pointed to contains an address, overwrites it with the address where "kj" is residing in memory. Since there is enough memory allocated in str to hold an address, the assignment succeeds and results in a usable address at that location.

It should be noted that this is, of course, not guaranteed to work. The most common cause for failure should be alignment issues -- str is, I think, not required to be aligned properly for a pointer value. The standard mandates that arguments to functions must be assignment-compatible with the parameter declarations. Arbitrary pointer types cannot be assigned to each other (one needs to go through void pointers for that, or cast).

Edit: david.pfx pointed out that (even with a proper cast) the code invokes undefined behaviour. The standard requires access to objects through compatible lvalues (including referenced pointers) in section 6.5/7 of the latest public draft. When casting properly and compiling with gcc -fstrict-aliasing -Wstrict-aliasing=2 ... gcc warns about the "type punning". The rationale is that the compiler should be free to assume that incompatible pointers do not modify the same memory region; here it is not required to assume that fn changes the contents of str. This enables the compiler to optimize away reloads (e.g. from memory to register) which would otherwise be necessary. This will play a role with optimization; a likely example where a debugging session would fail to reproduce the error (namely if the program being debugged would be compiled without optimization for debugging purposes). That being said, I'd be surprised if a non-optimizing compiler would produce unexpected results here, so I let the rest of the answer stand as is.--

I have inserted a number of debug printfs to illustrate what's going on. A live example can be seen here: http://ideone.com/aL407L.

#include<stdio.h>
#include<string.h>
static char* abc = "kj";

// Helper function to print the first bytes a char pointer points to
void printBytes(const char *const caption, const char *const ptr)
{
    int i=0;
    printf("%s: {", caption);
    for( i=0; i<sizeof(char *)-1; ++i)
    {
        printf("0x%x,", ptr[i]);
    }
    printf( "0x%x ...}\n", ptr[sizeof(char *)-1] );
}

// What exactly does this function do?
void fn(char**s) {
    printf("Inside fn: Argument value is %p\n", s);
    printBytes("Inside fn: Bytes at address above are", (char *)s);

    // This throws. *s is not a valid address.
    // printf("contents: ->%s<-\n", *s);

    *s = abc;
    printf("Inside fn: Bytes at address above after assignment\n");
    printBytes("           (should be address of \"kj\")", (char *)s);

    // Now *s holds a valid address (that of "kj").
    printf("Inside fn: Printing *s as string (should be kj): ->%s<-\n", *s);

}


int main() {
   char str[256];

   printf("size of ptr: %zu\n", sizeof(void *));
   strcpy(str, "AAAAAAAA"); // 9 defined bytes

   printf("addr of \"kj\": %p\n", abc);
   printf("str addr: %p (%p)\n", &str, str);
   printBytes("str contents before fn", str);

   printf("------------------------------\n");
   // Paramter type does not match! Illegal code
   // (6.5.16.1 of the latest public draft; incompatible
   // types for assignment).
   fn(&str);

   printf("------------------------------\n");

   printBytes("str contents after fn (i.e. abc -- note byte order!): ", str);
   printf("str addr after fn -- still the same! --: %p (%p)\n", &str, str);

   return 0;
}
like image 24
Peter - Reinstate Monica Avatar answered Nov 15 '22 09:11

Peter - Reinstate Monica


What you have here is simply Undefined Behaviour.

The parameter to the function is declared as pointer-to-pointer-to-char. The argument passed to it is pointer-to-array-of-256-char. The standard permits conversions between one pointer and another but since the object that s points to is not a pointer-to-char, dereferencing the pointer is Undefined Behaviour.

n1570 S6.5.3.2/4:

If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

It is futile to speculate how Undefined Behaviour will play out on different implementations. It's just plain wrong.


Just to be clear, the UB is in this line:

*s=abc;

The pointer s does not point to an object of the correct type (char*), so the use of * is UB.

like image 30
david.pfx Avatar answered Nov 15 '22 09:11

david.pfx