Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C standard regarding pointer arithmetic outside arrays

I read lot of things about pointer arithmetic and undefined behavior (link, link, link, link, link). It always ends up to the same conclusion: Pointer arithmetic is well defined only on array type and between array[0] and array[array_size+1] (one element past the end is valid with regard to the C standard).

My question is: Does it means that when the compiler sees a pointer arithmetic not related to any array (undefined behavior), it could emit what it want (even nothing) ? Or is it more a high level "undefined behavior" meaning you could reach unmapped memory, garbage data, etc and there is not guarantee about the address validity?

In this example:

char test[10];
char * ptr = &test[0];
printf("test[-1] : %d", *(ptr-1))

By "undefined behavior", is it just that the value is not guarantee at all (could be garbage, unmapped memory, etc) but we can still say with certainty that we are accessing the memory address contiguous to the array 8 bytes before the start? Or is it "undefined behavior" in a way that the compiler can just not emit this code at all?

Another simple use case: You want to compute the in-memory size of one function. One naïve implementation could be the following code assuming that the functions are outputted in the binary in the same order, are contiguous and without any padding in between.

#include <stdint.h>
#include <stdio.h>

void func1()
{}

void func2()
{}

int main()
{
  uint8_t * ptr1 = (uint8_t*) &func1;
  uint8_t * ptr2 = (uint8_t*) &func2;

  printf("Func 1 size : %ld", ptr2-ptr1);

  return 0;
}

Since ptr1 and ptr2 are not part of an array, it is considered as undefined behavior. Again, does it means the compiler could not emit those code? Or does "undefined behavior" means that the subtraction is meaningless depending on the system (functions not contiguous in memory, with padding, etc) but still occurs as expected? Is there any well defined way to compute the subtraction between two unrelated pointers?

like image 785
gagou7 Avatar asked May 29 '19 12:05

gagou7


People also ask

Does C support pointer arithmetic?

We can perform arithmetic operations on the pointers like addition, subtraction, etc. However, as we know that pointer contains the address, the result of an arithmetic operation performed on the pointer will also be a pointer if the other operand is of type integer.

How is pointer arithmetic done in C?

p = p + 2; If you have two pointers that point to the same array, you can subtract one pointer from the other. This operation yields the number of elements in the array that separate the two addresses that the pointers refer to.

What is the relationship between array notation and pointer arithmetic?

An array is represented by a variable that is associated with the address of its first storage location. A pointer is also the address of a storage location with a defined type, so D permits the use of the array [ ] index notation with both pointer variables and array variables.

How are pointers and arrays equivalent in C?

Arrays and pointers are synonymous in terms of how they use to access memory. But, the important difference between them is that, a pointer variable can take different addresses as value whereas, in case of array it is fixed. In C , name of the array always points to the first element of an array.

What is pointer arithmetic in C++?

Pointer arithmetic always yields a pointer into the original array or to one-past-the-last-element of the original array; When two pointers are compared or subtracted, they point into the same array; A pointer to one-past-the-last-element is never dereferenced.

How to reference array as a pointer in C?

For Example: if an array named arr then arr and &arr [0] can be used to reference array as a pointer. We provide nothing but the best curated videos and practice problems for our students.

Is array indexing the only form of pointer arithmetic?

Rule 17.4 then states that “Array indexing shall be the only allowed form of pointer arithmetic”. All 4 rules are “required” rather than “advisory”, so 17.4 appears to make the preceding 3 rules redundant. The implication seems to be that developers who break 17.4 should at least honour 17.1 to 17.3.

What is the lower bound of an array in C++?

In the absence of pointer arithmetic, the lower bound is always zero. If you’re using C++ rather than C and you’re taking my earlier advice to use Array, Vector or similar classes in preference to naked arrays, then you may be tempted to revert to naked arrays and pointer arithmetic where efficiency is vital.


1 Answers

The C standard doesn't define degrees of undefinedness for undefined behavior. If it's undefined, it's always all bets are off.

Additionally, modern compilers mess with this pointer provenance thing where the compiler even watches if a possibly valid pointer is derived correctly and if it isn't, it can adjust program behavior.

If you want mathematical pointer arithmetic without the possibility of UB, you can try and cast your pointer to uintptr_t prior to doing the math.


E.g.:

#include <stdio.h>
int main()
{
    char a,b;
    printf("&a=%p\n", &a);
    printf("&b=%p\n", &b);
    printf("&a+1=%p\n", &a+1);
    printf("&b+1=%p\n", &b+1);
    printf("%d\n", &a+1==&b || &b+1==&a);
}

on my machine, compiled with gcc -O2, results in:

&a=0x7ffee4e36cae
&b=0x7ffee4e36caf
&a+1=0x7ffee4e36caf
&b+1=0x7ffee4e36cb0
0

I.e., &a+1 has the same numerical address as &b but is treated as unequal to &b because the addresses are derived from different objects.

(This gcc optimization is somewhat controversial. It doesn't carry across function call / translation unit boundaries, clang doesn't do it, and it's not necessary as 6.5.9p6 does allow for accidental pointer equality. See dbush's to this Keith Thompson's answer for more details.)

like image 153
PSkocik Avatar answered Oct 09 '22 18:10

PSkocik