Is adding to a "char *" pointer UB, when it doesn't actually point to a char array?

Tags:

language-lawyer

C++17 (expr.add/4) say:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n; otherwise, the behavior is undefined.

struct Foo {
    float x, y, z;
};

Foo f;
char *p = reinterpret_cast<char*>(&f) + offsetof(Foo, z); // (*)
*reinterpret_cast<float*>(p) = 42.0f;

Has the line marked with (*) UB? reinterpret_cast<char*>(&f) doesn't point to a char array, but to a float, so it should UB according to the cited paragraph. But, if it is UB, then offsetof's usefulness would be limited.

Is it UB? If not, why not?

536

asked Nov 26 '17 16:11

geza

1 Answers

Any interpretation that disallows the intended usage of offsetof must be wrong:

#include <assert.h>
#include <stddef.h>
struct S { float a, b, c; };

const size_t idx_S[] = {
    offsetof(struct S, a),
    offsetof(struct S, b),
    offsetof(struct S, c),
};

float read_S(struct S *sp, unsigned int idx)
{
    assert(idx < 3);
    return *(float *)(((char *)sp) + idx_S[idx]); // intended to be valid
}

However, any interpretation that allows one to step past the end of an explicitly-declared array must also be wrong:

#include <assert.h>
#include <stddef.h>
struct S { float a[2]; float b[2]; };

static_assert(offsetof(struct S, b) == sizeof(float)*2,
    "padding between S.a and S.b -- should be impossible");

float read_S(struct S *sp, unsigned int idx)
{
    assert(idx < 4);
    return sp->a[idx]; // undefined behavior if idx >= 2,
                       // reading past end of array
}

And we are now on the horns of a dilemma, because the wording in both the C and C++ standards, that was intended to disallow the second case, probably also disallows the first case.

This is commonly known as the "what is an object?" problem. People, including members of the C and C++ committees, have been arguing about this and related issues since the 1990s, and there have been multiple attempts to fix the wording, and to the best of my knowledge none has succeeded (in the sense that all existing "reasonable" code is rendered definitely conforming and all existing "reasonable" optimizations are still allowed).

(Note: All of the above code is written as it would be written in C to emphasize that the same problem exists in both languages, and can be encountered without the use of any C++ constructs.)

answered Sep 22 '22 11:09

zwol

Related questions
                            
                                Why does inline function need to be declared static if it uses fprintf?
                            
                                Exception at 0x751fc52f, code: 0x71a: , flags=0x1 (execution cannot be continued) (first chance)
                            
                                How does duration_cast round
                            
                                Why does the cast operator to a private base not get used?
                            
                                Non-const reference to a non-const pointer pointing to the const object
                            
                                constexpr non-static member function with non-constexpr constructor (gcc,clang differ)
                            
                                What's the best strategy to get rid of "warning C4267 possible loss of data"?
                            
                                Exception handling for <mutex> and <condition_variable>
                            
                                Zipping an `std::tuple` and variadic arguments
                            
                                Member function not inherited? [duplicate]
                            
                                Why GCC and MSVC std::normal_distribution are different? [duplicate]
                            
                                lambda inside subscript iterator
                            
                                Can one hide parts of the inheritance hierarchy in C++?
                            
                                What rules govern use of multiple user-defined conversions between types?
                            
                                Can I make doxygen resolve @ref's to C++ standard library functions?
                            
                                c++ condition variable notification not working as expected
                            
                                Is it safe to convert a pointer to typed/sized enum to a pointer to the underlying type?
                            
                                Why did C++ introduce duration_cast instead of using static_cast?
                            
                                Visual Studio 2017, C++, pointing a wrong line while stepping through the code
                            
                                std::visit for variant fails to compile under clang 5 [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With