I have an array of structs, and I have a pointer to a member of one of those structs. I would like to know which element of the array contains the member. Here are two approaches:
#include <array>
#include <string>
struct xyz
{
float x, y;
std::string name;
};
typedef std::array<xyz, 3> triangle;
// return which vertex the given coordinate is part of
int vertex_a(const triangle& tri, const float* coord)
{
return reinterpret_cast<const xyz*>(coord) - tri.data();
}
int vertex_b(const triangle& tri, const float* coord)
{
std::ptrdiff_t offset = reinterpret_cast<const char*>(coord) - reinterpret_cast<const char*>(tri.data());
return offset / sizeof(xyz);
}
Here's a test driver:
#include <iostream>
int main()
{
triangle tri{{{12.3, 45.6}, {7.89, 0.12}, {34.5, 6.78}}};
for (const xyz& coord : tri) {
std::cout
<< vertex_a(tri, &coord.x) << ' '
<< vertex_b(tri, &coord.x) << ' '
<< vertex_a(tri, &coord.y) << ' '
<< vertex_b(tri, &coord.y) << '\n';
}
}
Both approaches produce the expected results:
0 0 0 0
1 1 1 1
2 2 2 2
But are they valid code?
In particular I wonder if vertex_a()
might be invoking undefined behavior by casting float* y
to xyz*
since the result does not actually point to a struct xyz
. That concern led me to write vertex_b()
, which I think is safe (is it?).
Here's the code generated by GCC 6.3 with -O3:
vertex_a(std::array<xyz, 3ul> const&, float const*):
movq %rsi, %rax
movabsq $-3689348814741910323, %rsi ; 0xCCC...CD
subq %rdi, %rax
sarq $3, %rax
imulq %rsi, %rax
vertex_b(std::array<xyz, 3ul> const&, float const*):
subq %rdi, %rsi
movabsq $-3689348814741910323, %rdx ; 0xCCC...CD
movq %rsi, %rax
mulq %rdx
movq %rdx, %rax
shrq $5, %rax
Neither is valid per the standard.
In vertex_a
, you're allowed to convert a pointer to xyz::x
to a pointer to xyz
because they're pointer-interconvertible:
Two objects a and b are pointer-interconvertible if [...] one is a standard-layout class object and the other is the first non-static data member of that object [...]
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a
reinterpret_cast
.
But you can't do the cast from a pointer to xyz::y
to a pointer to xyz
. That operation is undefined.
In vertex_b
, you're subtracting two pointers to const char
. That operation is defined in [expr.add] as:
If the expressions
P
andQ
point to, respectively, elementsx[i]
andx[j]
of the same array objectx
, the expressionP - Q
has the valuei − j
; otherwise, the behavior is undefined
Your expressions don't point to elements of an array of char
, so the behavior is undefined.
vertex_a
indeed breaks the strict aliasing rule (none of your float
s are valid xyz
s, and in 50% of your example they're not even at the start of an xyz
even if there's no padding).
vertex_b
relies on, shall we say, creative interpretation of the standard. Though your cast to const char*
is sound, performing arithmetic with it around the rest of the array is a little more dodgy. Historically I've concluded that this kind of thing has undefined behaviour, because "the object" in this context is the xyz
, not the array. However, I'm leaning towards others' interpretation nowadays that this will always work, and wouldn't expect anything else in practice.
vertex_b
is completely fine. You only maybe need to refine return offset / sizeof(xyz);
since you're dividing std::ptrdiff_t
with std::size_t
and implicitly casting the result into int
.
By book, this behavior is implementation defined. std::ptrdiff_t
is signed and std::size_t
unsigned and result of division might be larger than INT_MAX
(very unlikely) with huge array size on some platforms/compilers.
To cast away your worries, you can put assert()
s and/or #error
s which check PTRDIFF_MIN
, PTRDIFF_MAX
, SIZE_MAX
, INT_MIN
and INT_MAX
, but I personally would not bother so much.
Perhaps a more robust approach would involve changing the type signature to xyz::T*
(T
is a template argument so you can take xyz::x
or xyz::y
as needed) instead of float*
Then you can use offsetof(struct xyz,T)
to confidently compute the location of the start of the struct in a way that should be more resilient to future changes in its definition.
Then the rest follows as you are currently doing: once you have a pointer to the start of the struct finding its offset in the array is a valid pointer subtraction.
There is some pointer nastiness involved. But this is an approach that is used. e.g. see the container_of() macro in the linux kernel. https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/067/6717/6717s2.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With