I was researching how to get the memory offset of a member to a class in C++ and came across this on wikipedia:
In C++ code, you can not use offsetof to access members of structures or classes that are not Plain Old Data Structures.
I tried it out and it seems to work fine.
class Foo { private: int z; int func() {cout << "this is just filler" << endl; return 0;} public: int x; int y; Foo* f; bool returnTrue() { return false; } }; int main() { cout << offsetof(Foo, x) << " " << offsetof(Foo, y) << " " << offsetof(Foo, f); return 0; }
I got a few warnings, but it compiled and when run it gave reasonable output:
Laptop:test alex$ ./test 4 8 12
I think I'm either misunderstanding what a POD data structure is or I'm missing some other piece of the puzzle. I don't see what the problem is.
Bluehorn's answer is correct, but for me it doesn't explain the reason for the problem in simplest terms. The way I understand it is as follows:
If NonPOD is a non-POD class, then when you do:
NonPOD np; np.field;
the compiler does not necessarily access the field by adding some offset to the base pointer and dereferencing. For a POD class, the C++ Standard constrains it to do that(or something equivalent), but for a non-POD class it does not. The compiler might instead read a pointer out of the object, add an offset to that value to give the storage location of the field, and then dereference. This is a common mechanism with virtual inheritance if the field is a member of a virtual base of NonPOD. But it is not restricted to that case. The compiler can do pretty much anything it likes. It could call a hidden compiler-generated virtual member function if it wants.
In the complex cases, it is obviously not possible to represent the location of the field as an integer offset. So offsetof
is not valid on non-POD classes.
In cases where your compiler just so happens to store the object in a simple way (such as single inheritance, and normally even non-virtual multiple inheritance, and normally fields defined right in the class that you're referencing the object by as opposed to in some base class), then it will just so happen to work. There are probably cases which just so happen to work on every single compiler there is. This doesn't make it valid.
With simple inheritance, if B is derived from A, the usual implementation is that a pointer to B is just a pointer to A, with B's additional data stuck on the end:
A* ---> field of A <--- B* field of A field of B
With simple multiple inheritance, you generally assume that B's base classes (call 'em A1 and A2) are arranged in some order peculiar to B. But the same trick with the pointers can't work:
A1* ---> field of A1 field of A1 A2* ---> field of A2 field of A2
A1 and A2 "know" nothing about the fact that they're both base classes of B. So if you cast a B* to A1*, it has to point to the fields of A1, and if you cast it to A2* it has to point to the fields of A2. The pointer conversion operator applies an offset. So you might end up with this:
A1* ---> field of A1 <---- B* field of A1 A2* ---> field of A2 field of A2 field of B field of B
Then casting a B* to A1* doesn't change the pointer value, but casting it to A2* adds sizeof(A1)
bytes. This is the "other" reason why, in the absence of a virtual destructor, deleting B through a pointer to A2 goes wrong. It doesn't just fail to call the destructor of B and A1, it doesn't even free the right address.
Anyway, B "knows" where all its base classes are, they're always stored at the same offsets. So in this arrangement offsetof would still work. The standard doesn't require implementations to do multiple inheritance this way, but they often do (or something like it). So offsetof might work in this case on your implementation, but it is not guaranteed to.
Now, what about virtual inheritance? Suppose B1 and B2 both have A as a virtual base. This makes them single-inheritance classes, so you might think that the first trick will work again:
A* ---> field of A <--- B1* A* ---> field of A <--- B2* field of A field of A field of B1 field of B2
But hang on. What happens when C derives (non-virtually, for simplicity) from both B1 and B2? C must only contain 1 copy of the fields of A. Those fields can't immediately precede the fields of B1, and also immediately precede the fields of B2. We're in trouble.
So what implementations might do instead is:
// an instance of B1 looks like this, and B2 similar A* ---> field of A field of A B1* ---> pointer to A field of B1
Although I've indicated B1* pointing to the first part of the object after the A subobject, I suspect (without bothering to check) the actual address won't be there, it'll be the start of A. It's just that unlike simple inheritance, the offsets between the actual address in the pointer, and the address I've indicated in the diagram, will never be used unless the compiler is certain of the dynamic type of the object. Instead, it will always go through the meta-information to reach A correctly. So my diagrams will point there, since that offset will always be applied for the uses we're interested in.
The "pointer" to A could be a pointer or an offset, it doesn't really matter. In an instance of B1, created as a B1, it points to (char*)this - sizeof(A)
, and the same in an instance of B2. But if we create a C, it can look like this:
A* ---> field of A field of A B1* ---> pointer to A // points to (char*)(this) - sizeof(A) as before field of B1 B2* ---> pointer to A // points to (char*)(this) - sizeof(A) - sizeof(B1) field of B2 C* ----> pointer to A // points to (char*)(this) - sizeof(A) - sizeof(B1) - sizeof(B2) field of C field of C
So to access a field of A using a pointer or reference to B2 requires more than just applying an offset. We must read the "pointer to A" field of B2, follow it, and only then apply an offset, because depending what class B2 is a base of, that pointer will have different values. There is no such thing as offsetof(B2,field of A)
: there can't be. offsetof will never work with virtual inheritance, on any implementation.
Short answer: offsetof is a feature that is only in the C++ standard for legacy C compatibility. Therefore it is basically restricted to the stuff than can be done in C. C++ supports only what it must for C compatibility.
As offsetof is basically a hack (implemented as macro) that relies on the simple memory-model supporting C, it would take a lot of freedom away from C++ compiler implementors how to organize class instance layout.
The effect is that offsetof will often work (depending on source code and compiler used) in C++ even where not backed by the standard - except where it doesn't. So you should be very careful with offsetof usage in C++, especially since I do not know a single compiler that will generate a warning for non-POD use... Modern GCC and Clang will emit a warning if offsetof
is used outside the standard (-Winvalid-offsetof
).
Edit: As you asked for example, the following might clarify the problem:
#include <iostream> using namespace std; struct A { int a; }; struct B : public virtual A { int b; }; struct C : public virtual A { int c; }; struct D : public B, public C { int d; }; #define offset_d(i,f) (long(&(i)->f) - long(i)) #define offset_s(t,f) offset_d((t*)1000, f) #define dyn(inst,field) {\ cout << "Dynamic offset of " #field " in " #inst ": "; \ cout << offset_d(&i##inst, field) << endl; } #define stat(type,field) {\ cout << "Static offset of " #field " in " #type ": "; \ cout.flush(); \ cout << offset_s(type, field) << endl; } int main() { A iA; B iB; C iC; D iD; dyn(A, a); dyn(B, a); dyn(C, a); dyn(D, a); stat(A, a); stat(B, a); stat(C, a); stat(D, a); return 0; }
This will crash when trying to locate the field a
inside type B
statically, while it works when an instance is available. This is because of the virtual inheritance, where the location of the base class is stored into a lookup table.
While this is a contrived example, an implementation could use a lookup table also to find the public, protected and private sections of a class instance. Or make the lookup completely dynamic (use a hash table for fields), etc.
The standard just leaves all possibilities open by restricting offsetof to POD (IOW: no way to use a hash table for POD structs... :)
Just another note: I had to reimplement offsetof (here: offset_s) for this example as GCC actually errors out when I call offsetof for a field of a virtual base class.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With