Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What treatment can a pointer undergo and still be valid?

Which of the following ways of treating and trying to recover a C pointer, is guaranteed to be valid?

1) Cast to void pointer and back

int f(int *a) {
    void *b = a;
    a = b;
    return *a;
}

2) Cast to appropriately sized integer and back

int f(int *a) {
    uintptr_t b = a;
    a = (int *)b;
    return *a;
}

3) A couple of trivial integer operations

int f(int *a) {
    uintptr_t b = a;
    b += 99;
    b -= 99;
    a = (int *)b;
    return *a;
}

4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged

int f(int *a) {
    uintptr_t b = a;
    char s[32];
    // assume %lu is suitable
    sprintf(s, "%lu", b);
    b = strtoul(s);
    a = (int *)b;
    return *a;
}

5) More indirect integer operations that will leave the value unchanged

int f(int *a) {
    uintptr_t b = a;
    for (uintptr_t i = 0;; i++)
        if (i == b) {
            a = (int *)i;
            return *a;
        }
}

Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying something similar to case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop. Yet each case looks like an unobjectionable extension of the previous one.

Where is the line drawn between a valid case and an invalid one?

Added based on discussion in comments: while I still can't find the post that inspired case 5, I don't remember what type of pointer was involved; in particular, it might have been a function pointer, which might be why that case demonstrated invalid code whereas my case 5 is valid code.

Second addition: okay, here's another source that says there is a problem, and this one I do have a link to. https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf - the discussion about pointer provenance - says, and backs up with evidence, that no, if the compiler loses track of where a pointer came from, it's undefined behavior.

like image 763
rwallace Avatar asked Dec 29 '16 14:12

rwallace


Video Answer


1 Answers

According to the C11 draft standard:

Example 1

Valid, by §6.5.16.1, even without an explicit cast.

Example 2

The intptr_t and uintptr_t types are optional. Assigning a pointer to an integer requires an explicit cast (§6.5.16.1), although gcc and clang will only warn you if you don’t have one. With those caveats, the round-trip conversion is valid by §7.20.1.4. ETA: John Bellinger brings up that the behavior is only specified when you do an intermediate cast to void* both ways. However, both gcc and clang allow the direct conversion as a documented extension.

Example 3

Safe, but only because you’re using unsigned arithmetic, which cannot overflow, and are therefore guaranteed to get the same object representation back. An intptr_t could overflow! If you want to do pointer arithmetic safely, you can convert any kind of pointer to char* and then add or subtract offsets within the same structure or array. Remember, sizeof(char) is always 1. ETA: The standard guarantees that the two pointers compare equal, but your link to Chisnall et al. gives examples where compilers nevertheless assume the two pointers do not alias each other.

Example 4

Always, always, always check for buffer overflows whenever you read from and especially whenever you write to a buffer! If you can mathematically prove that overflow cannot happen by static analysis? Then write out the assumptions that justify that, explicitly, and assert() or static_assert() that they haven’t changed. Use snprintf(), not the deprecated, unsafe sprintf()! If you remember nothing else from this answer, remember that!

To be absolutely pedantic, the portable way to do this would be to use the format specifiers in <inttypes.h> and define the buffer length in terms of the maximum value of any pointer representation. In the real world, you would print pointers out with the %p format.

The answer to the question you intended to ask is yes, though: all that matters is that you get back the same object representation. Here’s a less contrived example:

#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    int i = 1;
    const uintptr_t u = (uintptr_t)(void*)&i;
    uintptr_t v;

    memcpy( &v, &u, sizeof(v) );
    int* const p = (int*)(void*)v;

    assert(p == &i);
    *p = 2;
    printf( "%d = %d.\n", i, *p ); 

    return EXIT_SUCCESS;
}

All that matter are the bits in your object representation. This code also follows the strict aliasing rules in §6.5. It compiles and runs fine on the compilers that gave Chisnall et al trouble.

Example 5

This works, same as above.

One extremely pedantic footnote that will never ever be relevant to your coding: some obsolete esoteric hardware has one’s-complement or sign-and-magnitude representation of signed integers, and on these, there might be a distinct value of negative zero that might or might not trap. On some CPUs, this might be a valid pointer or null pointer representation distinct from positive zero. And on some CPUs, positive and negative zero might compare equal.

PS

The standard says:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

Furthermore, if the two array objects are consecutive rows of the same multidimensional array, one past the end of the first row is a valid pointer to the start of the next row. Therefore, even a pathological implementation that deliberately sets out to cause as many bugs as the standard allows could only do so if your manipulated pointer compares equal to the address of an array object, in which case the implementation might in theory decide to interpret it as one-past-the-end of some other array object instead.

The intended behavior is clearly that the pointer comparing equal both to &array1+1 and to &array2 is equivalent to both: it means to let you compare it to addresses within array1 or dereference it to get array2[0]. However, the Standard does not actually say that.

PPS

The standards committee has addressed some of these issues and proposes that the C standard explicitly add language about pointer provenance. This would nail down whether a conforming implementation is allowed to assume that a pointer created by bit manipulation does not alias another pointer.

Specifically, the proposed corrigendum would introduce pointer provenance and allow pointers with different provenance not to compare equal. It would also introduce a -fno-provenance option, which would guarantee that any two pointers compare equal if and only if they have the same numeric address. (As discussed above, two object pointers that compare equal alias each other.)

like image 135
Davislor Avatar answered Oct 26 '22 10:10

Davislor