I've spent my spare time doing some test, fun things and implementing some various things like simple algorithm, data-structure for my personal joy in C these days....
But, I ended up finding out something interesting to me. I do not know why this result is happening until now..
max_arr_count_index
is assigned depending on arr[5]
value, which is past the end of the array +1.
Is there someone who can explain this to me? I know it should not be. I assigned the value the past one index of the array (here, arr[5] = 30 in the problem case) and it's not safe, and it is undefined behavior as defined by the standard.
I am not gonna do the same thing in the real field, But, I just want to get more under the hood here.
LLVM and GCC have given me the same result.
Code and result is below:
[No Problem case: I do not assign the value past end of the index]
#include <stdio.h>
int arr[] = {11,33,55,77,88};
int max_arr_count_index = (sizeof(arr) / sizeof(arr[0]));
// print all
void print_all_arr(int* arr)
{
// just print all arr datas regarding index.
for(int i = 0; i < max_arr_count_index; i++) {
printf("arr[%d] = %d \n", i, arr[i]);
}
}
int main(int argc, const char * argv[]) {
// insert code here...
printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
printf("[before]The original array elements are :\n");
print_all_arr(arr);
arr[0] = 1;
arr[1] = 2;
arr[2] = 3;
arr[3] = 4;
arr[4] = 5;
// arr[5] = 1000;
printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
printf("[after]The array elements after :\n");
print_all_arr(arr);
return 0;
}
No problem result is below:
[before]max_arr_count_index : 5
[before]The original array elements are :
arr[0] = 11
arr[1] = 33
arr[2] = 55
arr[3] = 77
arr[4] = 88
[after]max_arr_count_index : 5
[after]The array elements after :
arr[0] = 1
arr[1] = 2
arr[2] = 3
arr[3] = 4
arr[4] = 5
Program ended with exit code: 0
[Problem case: I assigned the value past end of the index]
#include <stdio.h>
int arr[] = {11,33,55,77,88};
int max_arr_count_index = (sizeof(arr) / sizeof(arr[0]));
// print all
void print_all_arr(int* arr)
{
// just print all arr datas regarding index.
for(int i = 0; i < max_arr_count_index; i++) {
printf("arr[%d] = %d \n", i, arr[i]);
}
}
int main(int argc, const char * argv[]) {
// insert code here...
printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
printf("[before]The original array elements are :\n");
print_all_arr(arr);
arr[0] = 1;
arr[1] = 2;
arr[2] = 3;
arr[3] = 4;
arr[4] = 5;
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
arr[5] = 30;
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
printf("[after]The array elements after arr[5] is assigned 30 :\n");
print_all_arr(arr);
return 0;
}
Result is below :
[before]max_arr_count_index : 5
[before]The original array elements are :
arr[0] = 11
arr[1] = 33
arr[2] = 55
arr[3] = 77
arr[4] = 88
[after]max_arr_count_index : 30
[after]The array elements after arr[5] is assigned 30 :
arr[0] = 1
arr[1] = 2
arr[2] = 3
arr[3] = 4
arr[4] = 5
arr[5] = 30
arr[6] = 0
arr[7] = 0
arr[8] = 0
arr[9] = 0
arr[10] = 0
arr[11] = 0
arr[12] = 0
arr[13] = 0
arr[14] = 0
arr[15] = 0
arr[16] = 0
arr[17] = 0
arr[18] = 0
arr[19] = 0
arr[20] = 0
arr[21] = 0
arr[22] = 0
arr[23] = 0
arr[24] = 0
arr[25] = 0
arr[26] = 0
arr[27] = 0
arr[28] = 0
arr[29] = 0
Program ended with exit code: 0
C arrays don't have an end marker. It is your responsibility as the programmer to keep track of the allocated size of the array to make sure you don't try to access element outside the allocated size. If you do access an element outside the allocated size, the result is undefined behaviour.
1) Using the array length property The length property returns the number of elements in an array. Subtracting 1 from the length of an array gives the index of the last element of an array using which the last element can be accessed.
end also represents the last index of an array. For example, X(end) is the last element of X , and X(3:end) selects the third through final elements of X .
Arrays are used to store multiple values in a single variable, instead of declaring separate variables for each value. To create an array, define the data type (like int ) and specify the name of the array followed by square brackets [].
So obviously, as far as the C standard is concerned, this is undefined behaviour, and the compiler could make fly demons out of your nose and it would be fine-ish.
But you want to go deeper, as you ask for "under the hood", so we would essentially have to look for the assembler output. An excerpt (produced with gcc -g test test.c
and objdump -S --disassemble test
) is:
int main(int argc, const char * argv[]) {
743: 55 push %rbp
744: 48 89 e5 mov %rsp,%rbp
747: 48 83 ec 10 sub $0x10,%rsp
74b: 89 7d fc mov %edi,-0x4(%rbp)
74e: 48 89 75 f0 mov %rsi,-0x10(%rbp)
// insert code here...
printf("[before]max_arr_count_index : %d\n", max_arr_count_index);
752: 8b 05 fc 08 20 00 mov 0x2008fc(%rip),%eax # 201054 <max_arr_count_index>
758: 89 c6 mov %eax,%esi
75a: 48 8d 3d 37 01 00 00 lea 0x137(%rip),%rdi # 898 <_IO_stdin_used+0x18>
761: b8 00 00 00 00 mov $0x0,%eax
766: e8 35 fe ff ff callq 5a0 <printf@plt>
printf("[before]The original array elements are :\n");
76b: 48 8d 3d 4e 01 00 00 lea 0x14e(%rip),%rdi # 8c0 <_IO_stdin_used+0x40>
772: e8 19 fe ff ff callq 590 <puts@plt>
print_all_arr(arr);
777: 48 8d 3d c2 08 20 00 lea 0x2008c2(%rip),%rdi # 201040 <arr>
77e: e8 6d ff ff ff callq 6f0 <print_all_arr>
arr[0] = 1;
783: c7 05 b3 08 20 00 01 movl $0x1,0x2008b3(%rip) # 201040 <arr>
78a: 00 00 00
arr[1] = 2;
78d: c7 05 ad 08 20 00 02 movl $0x2,0x2008ad(%rip) # 201044 <arr+0x4>
794: 00 00 00
arr[2] = 3;
797: c7 05 a7 08 20 00 03 movl $0x3,0x2008a7(%rip) # 201048 <arr+0x8>
79e: 00 00 00
arr[3] = 4;
7a1: c7 05 a1 08 20 00 04 movl $0x4,0x2008a1(%rip) # 20104c <arr+0xc>
7a8: 00 00 00
arr[4] = 5;
7ab: c7 05 9b 08 20 00 05 movl $0x5,0x20089b(%rip) # 201050 <arr+0x10>
7b2: 00 00 00
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
arr[5] = 30;
7b5: c7 05 95 08 20 00 1e movl $0x1e,0x200895(%rip) # 201054 <max_arr_count_index>
7bc: 00 00 00
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
printf("[after]max_arr_count_index : %d\n", max_arr_count_index);
7bf: 8b 05 8f 08 20 00 mov 0x20088f(%rip),%eax # 201054 <max_arr_count_index>
7c5: 89 c6 mov %eax,%esi
7c7: 48 8d 3d 22 01 00 00 lea 0x122(%rip),%rdi # 8f0 <_IO_stdin_used+0x70>
7ce: b8 00 00 00 00 mov $0x0,%eax
7d3: e8 c8 fd ff ff callq 5a0 <printf@plt>
printf("[after]The array elements after insertion :\n");
7d8: 48 8d 3d 39 01 00 00 lea 0x139(%rip),%rdi # 918 <_IO_stdin_used+0x98>
7df: e8 ac fd ff ff callq 590 <puts@plt>
print_all_arr(arr);
7e4: 48 8d 3d 55 08 20 00 lea 0x200855(%rip),%rdi # 201040 <arr>
7eb: e8 00 ff ff ff callq 6f0 <print_all_arr>
return 0;
7f0: b8 00 00 00 00 mov $0x0,%eax
}
As you can see, even at that level, the disassembler already knows that you are effectively setting max_arr_count_index
. But why?
It is because the memory layout produced by GCC is simply that way (and we used -g
with gcc
to make it embed debug information so that the disassembler can know which memory location is which field). You have a global array of five ints, and a global int variable, declared right after each other. The global int variable is simply right behind the array in memory. Accessing the integer right behind the end of the array thus gives max_arr_count_index
.
Remember that access to an element i
of an array arr
of e.g. int
s is (at least on all architectures I know) simply accessing the memory location arr+sizeof(int)*i
, where arr
is the address of the first element.
As said, this is undefined behaviour. GCC could also order the global int variable before the array, which would lead to different effects, possibly even the program terminating when attempting to access arr[5]
if there is no valid memory page at that location.
Accessing array out of bounds invoke undefined behavior. Nothing good can be expected in this case. Size of arr
is 5
. You can access arr
from arr[0]
to arr[4]
.
Taking UB aside for an instant, the explanation for the behavior
/* Point is this one.
If I assign arr[5] 30, then, max_arr_count_index is changed also as
30. if I assign arr[5] 10000 max_arr_count_index is assigned 10000.
*/
could be the variable max_arr_count_index
is declared just after the array arr
. Compiler may allocated the memory for max_arr_count_index
just past the last element the array arr
. For example, if arr[4]
is at 0x100
then memory for max_arr_count_index
is allocated at 0x104
. So past the array arr
is address 0x104
. Since &arr[5]
is the same address as of max_arr_count_index
, assigning a value to arr[5]
write that value to the address of max_arr_count_index
. Please note that this is not what exactly happening. Its an intuition for this behavior. Once there is UB then all bets off.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With