Pointer arithmetic using cast to "wrong" type

Question

I have an array of structs, and I have a pointer to a member of one of those structs. I would like to know which element of the array contains the member. Here are two approaches:

#include <array>
#include <string>

struct xyz
{
    float x, y;
    std::string name;
};

typedef std::array<xyz, 3> triangle;

// return which vertex the given coordinate is part of
int vertex_a(const triangle& tri, const float* coord)
{
    return reinterpret_cast<const xyz*>(coord) - tri.data();
}

int vertex_b(const triangle& tri, const float* coord)
{
    std::ptrdiff_t offset = reinterpret_cast<const char*>(coord) - reinterpret_cast<const char*>(tri.data());
    return offset / sizeof(xyz);
}

Here's a test driver:

#include <iostream>

int main()
{
    triangle tri{{{12.3, 45.6}, {7.89, 0.12}, {34.5, 6.78}}};
    for (const xyz& coord : tri) {
        std::cout
            << vertex_a(tri, &coord.x) << ' '
            << vertex_b(tri, &coord.x) << ' '
            << vertex_a(tri, &coord.y) << ' '
            << vertex_b(tri, &coord.y) << '
';
    }
}

Both approaches produce the expected results:

0 0 0 0
1 1 1 1
2 2 2 2

But are they valid code?

In particular I wonder if vertex_a() might be invoking undefined behavior by casting float* y to xyz* since the result does not actually point to a struct xyz. That concern led me to write vertex_b(), which I think is safe (is it?).

Here's the code generated by GCC 6.3 with -O3:

vertex_a(std::array<xyz, 3ul> const&, float const*):
    movq    %rsi, %rax
    movabsq $-3689348814741910323, %rsi ; 0xCCC...CD
    subq    %rdi, %rax
    sarq    $3, %rax
    imulq   %rsi, %rax

vertex_b(std::array<xyz, 3ul> const&, float const*):
    subq    %rdi, %rsi
    movabsq $-3689348814741910323, %rdx ; 0xCCC...CD
    movq    %rsi, %rax
    mulq    %rdx
    movq    %rdx, %rax
    shrq    $5, %rax

Barry · Accepted Answer

Neither is valid per the standard.

In vertex_a, you're allowed to convert a pointer to xyz::x to a pointer to xyz because they're pointer-interconvertible:

Two objects a and b are pointer-interconvertible if [...] one is a standard-layout class object and the other is the first non-static data member of that object [...]

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast.

But you can't do the cast from a pointer to xyz::y to a pointer to xyz. That operation is undefined.

In vertex_b, you're subtracting two pointers to const char. That operation is defined in [expr.add] as:

If the expressions P and Q point to, respectively, elements x[i] and x[j] of the same array object x, the expression P - Q has the value i − j; otherwise, the behavior is undefined

Your expressions don't point to elements of an array of char, so the behavior is undefined.

Lightness Races in Orbit · Answer

vertex_a indeed breaks the strict aliasing rule (none of your floats are valid xyzs, and in 50% of your example they're not even at the start of an xyz even if there's no padding).

vertex_b relies on, shall we say, creative interpretation of the standard. Though your cast to const char* is sound, performing arithmetic with it around the rest of the array is a little more dodgy. Historically I've concluded that this kind of thing has undefined behaviour, because "the object" in this context is the xyz, not the array. However, I'm leaning towards others' interpretation nowadays that this will always work, and wouldn't expect anything else in practice.

BJovke · Answer

vertex_b is completely fine. You only maybe need to refine return offset / sizeof(xyz); since you're dividing std::ptrdiff_t with std::size_t and implicitly casting the result into int. By book, this behavior is implementation defined. std::ptrdiff_t is signed and std::size_t unsigned and result of division might be larger than INT_MAX (very unlikely) with huge array size on some platforms/compilers.

To cast away your worries, you can put assert()s and/or #errors which check PTRDIFF_MIN, PTRDIFF_MAX, SIZE_MAX, INT_MIN and INT_MAX, but I personally would not bother so much.

Jesse Cohen · Answer

Perhaps a more robust approach would involve changing the type signature to xyz::T* (T is a template argument so you can take xyz::x or xyz::y as needed) instead of float*

Then you can use offsetof(struct xyz,T) to confidently compute the location of the start of the struct in a way that should be more resilient to future changes in its definition.

Then the rest follows as you are currently doing: once you have a pointer to the start of the struct finding its offset in the array is a valid pointer subtraction.

There is some pointer nastiness involved. But this is an approach that is used. e.g. see the container_of() macro in the linux kernel. https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/067/6717/6717s2.html

Pointer arithmetic using cast to "wrong" type

Tags:

c++

language-lawyer

pointer-arithmetic

undefined-behavior

John Zwinck

4 Answers

Barry

Lightness Races in Orbit

BJovke

Jesse Cohen

Recent Activity

Donate For Us

Pointer arithmetic using cast to "wrong" type

Tags:

c++

language-lawyer

pointer-arithmetic

undefined-behavior

John Zwinck

4 Answers

Barry

Lightness Races in Orbit

BJovke

Jesse Cohen

Related questions

Recent Activity

Donate For Us