Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In C, why can I see a value written past the end of an array in a different variable?

I've spent my spare time doing some test, fun things and implementing some various things like simple algorithm, data-structure for my personal joy in C these days....

But, I ended up finding out something interesting to me. I do not know why this result is happening until now..

max_arr_count_index is assigned depending on arr[5] value, which is past the end of the array +1.

Is there someone who can explain this to me? I know it should not be. I assigned the value the past one index of the array (here, arr[5] = 30 in the problem case) and it's not safe, and it is undefined behavior as defined by the standard.

I am not gonna do the same thing in the real field, But, I just want to get more under the hood here.

LLVM and GCC have given me the same result.

Code and result is below:

[No Problem case: I do not assign the value past end of the index]

#include <stdio.h>

int arr[] = {11,33,55,77,88};
int max_arr_count_index = (sizeof(arr) / sizeof(arr[0]));

// print all
void print_all_arr(int* arr)
{
    // just print all arr datas regarding index.
    for(int i = 0; i < max_arr_count_index; i++) {
        printf("arr[%d] = %d \n", i, arr[i]);
    }
}

int main(int argc, const char * argv[]) {
    // insert code here...
    printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
    printf("[before]The original array elements are :\n");
    print_all_arr(arr);
    arr[0] = 1;
    arr[1] = 2;
    arr[2] = 3;
    arr[3] = 4;
    arr[4] = 5;
    // arr[5] = 1000;
    printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
    printf("[after]The array elements after :\n");

    print_all_arr(arr);

    return 0;
}

No problem result is below:

[before]max_arr_count_index : 5
[before]The original array elements are :
arr[0] = 11 
arr[1] = 33 
arr[2] = 55 
arr[3] = 77 
arr[4] = 88 
[after]max_arr_count_index : 5
[after]The array elements after :
arr[0] = 1 
arr[1] = 2 
arr[2] = 3 
arr[3] = 4 
arr[4] = 5 
Program ended with exit code: 0

[Problem case: I assigned the value past end of the index]

#include <stdio.h>

int arr[] = {11,33,55,77,88};
int max_arr_count_index = (sizeof(arr) / sizeof(arr[0]));

// print all
void print_all_arr(int* arr)
{
    // just print all arr datas regarding index.
    for(int i = 0; i < max_arr_count_index; i++) {
        printf("arr[%d] = %d \n", i, arr[i]);
    }
}

int main(int argc, const char * argv[]) {
    // insert code here...
    printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
    printf("[before]The original array elements are :\n");
    print_all_arr(arr);
    arr[0] = 1;
    arr[1] = 2;
    arr[2] = 3;
    arr[3] = 4;
    arr[4] = 5;

    /* Point is this one. 
       If I assign arr[5] 30, then, max_arr_count_index is changed also as            
       30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
    */

    arr[5] = 30;

    /* Point is this one. 
       If I assign arr[5] 30, then, max_arr_count_index is changed also as            
       30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
    */

    printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
    printf("[after]The array elements after arr[5] is assigned 30 :\n");

    print_all_arr(arr);

    return 0;
}

Result is below :

[before]max_arr_count_index : 5
[before]The original array elements are :
arr[0] = 11 
arr[1] = 33 
arr[2] = 55 
arr[3] = 77 
arr[4] = 88 
[after]max_arr_count_index : 30
[after]The array elements after arr[5] is assigned 30 :
arr[0] = 1 
arr[1] = 2 
arr[2] = 3 
arr[3] = 4 
arr[4] = 5 
arr[5] = 30 
arr[6] = 0 
arr[7] = 0 
arr[8] = 0 
arr[9] = 0 
arr[10] = 0 
arr[11] = 0 
arr[12] = 0 
arr[13] = 0 
arr[14] = 0 
arr[15] = 0 
arr[16] = 0 
arr[17] = 0 
arr[18] = 0 
arr[19] = 0 
arr[20] = 0 
arr[21] = 0 
arr[22] = 0 
arr[23] = 0 
arr[24] = 0 
arr[25] = 0 
arr[26] = 0 
arr[27] = 0 
arr[28] = 0 
arr[29] = 0 
Program ended with exit code: 0
like image 270
boraseoksoon Avatar asked Nov 04 '16 16:11

boraseoksoon


People also ask

What is at the end of an array in C?

C arrays don't have an end marker. It is your responsibility as the programmer to keep track of the allocated size of the array to make sure you don't try to access element outside the allocated size. If you do access an element outside the allocated size, the result is undefined behaviour.

How do you point the last element of an array?

1) Using the array length property The length property returns the number of elements in an array. Subtracting 1 from the length of an array gives the index of the last element of an array using which the last element can be accessed.

What is the end of an array?

end also represents the last index of an array. For example, X(end) is the last element of X , and X(3:end) selects the third through final elements of X .

What is an array in C?

Arrays are used to store multiple values in a single variable, instead of declaring separate variables for each value. To create an array, define the data type (like int ) and specify the name of the array followed by square brackets [].


2 Answers

So obviously, as far as the C standard is concerned, this is undefined behaviour, and the compiler could make fly demons out of your nose and it would be fine-ish.

But you want to go deeper, as you ask for "under the hood", so we would essentially have to look for the assembler output. An excerpt (produced with gcc -g test test.c and objdump -S --disassemble test) is:

int main(int argc, const char * argv[]) {
 743:   55                      push   %rbp
 744:   48 89 e5                mov    %rsp,%rbp
 747:   48 83 ec 10             sub    $0x10,%rsp
 74b:   89 7d fc                mov    %edi,-0x4(%rbp)
 74e:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
    // insert code here...
    printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
 752:   8b 05 fc 08 20 00       mov    0x2008fc(%rip),%eax        # 201054 <max_arr_count_index>
 758:   89 c6                   mov    %eax,%esi
 75a:   48 8d 3d 37 01 00 00    lea    0x137(%rip),%rdi        # 898 <_IO_stdin_used+0x18>
 761:   b8 00 00 00 00          mov    $0x0,%eax
 766:   e8 35 fe ff ff          callq  5a0 <printf@plt>
    printf("[before]The original array elements are :\n");
 76b:   48 8d 3d 4e 01 00 00    lea    0x14e(%rip),%rdi        # 8c0 <_IO_stdin_used+0x40>
 772:   e8 19 fe ff ff          callq  590 <puts@plt>
    print_all_arr(arr);
 777:   48 8d 3d c2 08 20 00    lea    0x2008c2(%rip),%rdi        # 201040 <arr>
 77e:   e8 6d ff ff ff          callq  6f0 <print_all_arr>
    arr[0] = 1;
 783:   c7 05 b3 08 20 00 01    movl   $0x1,0x2008b3(%rip)        # 201040 <arr>
 78a:   00 00 00 
    arr[1] = 2;
 78d:   c7 05 ad 08 20 00 02    movl   $0x2,0x2008ad(%rip)        # 201044 <arr+0x4>
 794:   00 00 00 
    arr[2] = 3;
 797:   c7 05 a7 08 20 00 03    movl   $0x3,0x2008a7(%rip)        # 201048 <arr+0x8>
 79e:   00 00 00 
    arr[3] = 4;
 7a1:   c7 05 a1 08 20 00 04    movl   $0x4,0x2008a1(%rip)        # 20104c <arr+0xc>
 7a8:   00 00 00 
    arr[4] = 5;
 7ab:   c7 05 9b 08 20 00 05    movl   $0x5,0x20089b(%rip)        # 201050 <arr+0x10>
 7b2:   00 00 00 
    /* Point is this one. 
       If I assign arr[5] 30, then, max_arr_count_index is changed also as            
       30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
    */

    arr[5] = 30;
 7b5:   c7 05 95 08 20 00 1e    movl   $0x1e,0x200895(%rip)        # 201054 <max_arr_count_index>
 7bc:   00 00 00 
    /* Point is this one. 
       If I assign arr[5] 30, then, max_arr_count_index is changed also as            
       30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
    */

    printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
 7bf:   8b 05 8f 08 20 00       mov    0x20088f(%rip),%eax        # 201054 <max_arr_count_index>
 7c5:   89 c6                   mov    %eax,%esi
 7c7:   48 8d 3d 22 01 00 00    lea    0x122(%rip),%rdi        # 8f0 <_IO_stdin_used+0x70>
 7ce:   b8 00 00 00 00          mov    $0x0,%eax
 7d3:   e8 c8 fd ff ff          callq  5a0 <printf@plt>
    printf("[after]The array elements after insertion :\n");
 7d8:   48 8d 3d 39 01 00 00    lea    0x139(%rip),%rdi        # 918 <_IO_stdin_used+0x98>
 7df:   e8 ac fd ff ff          callq  590 <puts@plt>

    print_all_arr(arr);
 7e4:   48 8d 3d 55 08 20 00    lea    0x200855(%rip),%rdi        # 201040 <arr>
 7eb:   e8 00 ff ff ff          callq  6f0 <print_all_arr>

    return 0;
 7f0:   b8 00 00 00 00          mov    $0x0,%eax
}

As you can see, even at that level, the disassembler already knows that you are effectively setting max_arr_count_index. But why?

It is because the memory layout produced by GCC is simply that way (and we used -g with gcc to make it embed debug information so that the disassembler can know which memory location is which field). You have a global array of five ints, and a global int variable, declared right after each other. The global int variable is simply right behind the array in memory. Accessing the integer right behind the end of the array thus gives max_arr_count_index.

Remember that access to an element i of an array arr of e.g. ints is (at least on all architectures I know) simply accessing the memory location arr+sizeof(int)*i, where arr is the address of the first element.

As said, this is undefined behaviour. GCC could also order the global int variable before the array, which would lead to different effects, possibly even the program terminating when attempting to access arr[5] if there is no valid memory page at that location.

like image 65
Jonas Schäfer Avatar answered Nov 08 '22 18:11

Jonas Schäfer


Accessing array out of bounds invoke undefined behavior. Nothing good can be expected in this case. Size of arr is 5. You can access arr from arr[0] to arr[4].

Taking UB aside for an instant, the explanation for the behavior

/* Point is this one. 
   If I assign arr[5] 30, then, max_arr_count_index is changed also as            
   30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/

could be the variable max_arr_count_index is declared just after the array arr. Compiler may allocated the memory for max_arr_count_index just past the last element the array arr. For example, if arr[4] is at 0x100 then memory for max_arr_count_index is allocated at 0x104. So past the array arr is address 0x104. Since &arr[5] is the same address as of max_arr_count_index, assigning a value to arr[5] write that value to the address of max_arr_count_index. Please note that this is not what exactly happening. Its an intuition for this behavior. Once there is UB then all bets off.

like image 36
haccks Avatar answered Nov 08 '22 19:11

haccks