Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is adding to a "char *" pointer UB, when it doesn't actually point to a char array?

C++17 (expr.add/4) say:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n; otherwise, the behavior is undefined.

struct Foo {
    float x, y, z;
};

Foo f;
char *p = reinterpret_cast<char*>(&f) + offsetof(Foo, z); // (*)
*reinterpret_cast<float*>(p) = 42.0f;

Has the line marked with (*) UB? reinterpret_cast<char*>(&f) doesn't point to a char array, but to a float, so it should UB according to the cited paragraph. But, if it is UB, then offsetof's usefulness would be limited.

Is it UB? If not, why not?

like image 536
geza Avatar asked Nov 26 '17 16:11

geza


People also ask

Is char * a pointer?

The type of both the variables is a pointer to char or (char*) , so you can pass either of them to a function whose formal argument accepts an array of characters or a character pointer.

What does * indicate in pointer?

In an expression, *pointer refers to some object using its memory address. A declaration such as int *pointer means that *pointer will refer to an int . Since *pointer refers to an int , this means that pointer is a pointer to an int . This is explained further here and here.

What happens when you add a value to a pointer?

Pointer Arithmetic Unlike regular numbers, adding 1 to a pointer will increment its value (a memory address) by the size of its underlying data type.

How does a char * work?

The char data type is an integral type, meaning the underlying value is stored as an integer. Similar to how a Boolean value 0 is interpreted as false and non-zero is interpreted as true , the integer stored by a char variable are intepreted as an ASCII character .


1 Answers

Any interpretation that disallows the intended usage of offsetof must be wrong:

#include <assert.h>
#include <stddef.h>
struct S { float a, b, c; };

const size_t idx_S[] = {
    offsetof(struct S, a),
    offsetof(struct S, b),
    offsetof(struct S, c),
};

float read_S(struct S *sp, unsigned int idx)
{
    assert(idx < 3);
    return *(float *)(((char *)sp) + idx_S[idx]); // intended to be valid
}

However, any interpretation that allows one to step past the end of an explicitly-declared array must also be wrong:

#include <assert.h>
#include <stddef.h>
struct S { float a[2]; float b[2]; };

static_assert(offsetof(struct S, b) == sizeof(float)*2,
    "padding between S.a and S.b -- should be impossible");

float read_S(struct S *sp, unsigned int idx)
{
    assert(idx < 4);
    return sp->a[idx]; // undefined behavior if idx >= 2,
                       // reading past end of array
}

And we are now on the horns of a dilemma, because the wording in both the C and C++ standards, that was intended to disallow the second case, probably also disallows the first case.

This is commonly known as the "what is an object?" problem. People, including members of the C and C++ committees, have been arguing about this and related issues since the 1990s, and there have been multiple attempts to fix the wording, and to the best of my knowledge none has succeeded (in the sense that all existing "reasonable" code is rendered definitely conforming and all existing "reasonable" optimizations are still allowed).

(Note: All of the above code is written as it would be written in C to emphasize that the same problem exists in both languages, and can be encountered without the use of any C++ constructs.)

like image 52
zwol Avatar answered Sep 22 '22 11:09

zwol