Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Methods with covariant return types crashes on VC++

The following code seems to run fine when compiled with clang or gcc (on macOS), but crashes when compiled with MS Visual C++ 2017. On the latter, the foo_clone object appears to be corrupted and the program crashes with an access violation on the line foo_clone->get_identifier().

It does work on VC++ if I remove the covariant return types (all clone-methods return IDO*), or when std::enable_shared_from_this is removed, or when all inheritance is made virtual.

Why does it work with clang/gcc but not with VC++?

#include <memory>
#include <iostream>

class IDO {
public:
    virtual ~IDO() = default;

    virtual const char* get_identifier() const = 0;

    virtual IDO* clone() const = 0;
};

class DO
    : public virtual IDO
    , public std::enable_shared_from_this<DO> 
{
public:
    const char* get_identifier() const override { return "ok"; }
};

class D : public virtual IDO, public DO {
    D* clone() const override {
        return nullptr;
    }
};

class IA : public virtual IDO {};

class Foo : public IA, public D {
public:
    Foo* clone() const override {
        return new Foo();
    }
};

int main(int argc, char* argv[]) {
    Foo* foo = new Foo();
    Foo* foo_clone = foo->clone();
    foo_clone->get_identifier();
}

Message:

Exception thrown at 0x00007FF60940180B in foo.exe: 0xC0000005: Access violation reading location 0x0000000000000004.

like image 761
spkersten Avatar asked Nov 28 '17 08:11

spkersten


1 Answers

This appears to be a miscompilation by VC++. It going away when the enable_shared_from_this isn't there is a red herring; the problem is just masked.

Some background: resolving overridden functions in C++ generally happens via vtables. However, in the presence of multiple and virtual inheritance and co-variant return types, there are some challenges that must be met, and different ways of meeting them.

Consider:

Foo* foo = new Foo();
IDO* ido = foo;
D* d = foo;

foo->clone(); // must call Foo::clone() and return a Foo*
ido->clone(); // must call Foo::clone() and return an IDO*
d->clone(); // must call Foo::clone() and return a D*

Keep in mind that Foo::clone() returns a Foo* no matter what, and conversion from Foo* to IDO* or D* is not a simple no-op. Within the complete Foo object, the IDO subobject lives at offset 32 (assuming MSVC++ and 64-bit compilation), and the D subobject lives at offset 8. To get from a Foo* to a D* means adding 8 to the pointer, and getting an IDO* actually means loading information from the Foo* where exactly the IDO subobject is located.

However, let's look at the vtable generated for all these classes. A vtable for IDO has this layout:

0: destructor
1: const char* get_identifier() const
2: IDO* clone() const

A vtable for D has this layout:

0: destructor
1: const char* get_identifier() const
2: IDO* clone() const
3: D* clone() const

Slot 2 is there because the base class IDO has this function. Slot 3 is there because this function exists too. Could we omit this slot, and instead generate additional code at callsites to convert from the IDO* to the D*? Perhaps, but that would be less efficient.

A vtable for Foo looks like this:

0: destructor
1: const char* get_identifier() const
2: IDO* clone() const
3: D* clone() const
4: Foo* clone() const
5: Foo* clone() const

Again, it inherits the layout of D and appends its own slots. I actually have no idea why there are two new slots - it's probably just a suboptimal algorithm that sticks around for compatibility reasons.

Now, what do we put into these slots for a concrete object of type Foo? Slots 4 and 5 simple get Foo::clone(). But that function returns a Foo*, so it's not suitable for slots 2 and 3. For these, the compiler creates stubs (called thunks) that call the main version and convert the result, i.e. the compiler creates something like this for slot 3:

D* Foo::clone$D() const {
  Foo* real = clone();
  return static_cast<D*>(real);
}

Now we get to the miscompilation: for some reason, the compiler, upon seeing this call:

foo->clone();

calls not slot 4 or 5, but slot 3. But slot 3 returns a D*! The code then just continues to use that D* as a Foo*, or in other words, you get the same kind of behavior as if you had done:

Foo* wtf = reinterpret_cast<Foo*>(
  reinterpret_cast<char*>(foo_clone) + 8);

This obviously isn't going to end well.

Specifically, what happens is that in the call foo_clone->get_identifier(); the compiler wants to cast the Foo* foo_clone to an IDO* (get_identifier requires its this pointer to be an IDO* because it was originally declared in IDO). As I mentioned before, the exact position of the IDO object within any Foo object is not fixed; it depends on the full type of the object (it is 32 if the full object is a Foo, but it might be something else if it was a class derived from Foo). To do the conversion, the compiler must therefore load the offset from within the object. Specifically, it can load the "virtual base pointer" (vbptr) located at offset 0 of any Foo object, which points to a "virtual base table" (vbtable), which contains the offset.

But remember, we have a corrupted Foo* that already points to offset 8 of the real object. So we access offset 0 of offset 8, and what's there? Well, as it happens, what's there is the weak_ptr from the enable_shared_from_this object, and it is null. So we get null for the vbptr, and the attempt to dereference it to get the object crashes. (The offset to the virtual base is stored at offset 4 in the vbtable, which is why the crash address you get is 0x000...004.)

If you remove all the covariant shenanigans, the vtable shrinks to a nice single entry for clone() and the miscompilation doesn't appear.

But why does the problem go away if you remove the enable_shared_from_this? Well, because then the thing at offset 8 is not some null pointer inside a weak_ptr, but the vbptr of the DO subobject. (Generally speaking, every branch of a inheritance graph gets its own vbptr. IA has one that Foo shares, and DO has one that D shares.) And that vbptr contains the information needed to convert a D* to an IDO*. Our Foo* is really a D* in disguise, so everything just happens to work out correctly.

Appendix

The MSVC++ compiler has an undocumented option to dump object layouts. Here is its output for Foo with the enable_shared_from_this:

class Foo   size(40):
    +---
 0  | +--- (base class IA)
 0  | | {vbptr}
    | +---
 8  | +--- (base class D)
 8  | | +--- (base class DO)
 8  | | | +--- (base class std::enable_shared_from_this<class DO>)
 8  | | | | ?$weak_ptr@VDO@@ _Wptr
    | | | +---
24  | | | {vbptr}
    | | +---
    | +---
    +---
    +--- (virtual base IDO)
32  | {vfptr}
    +---

Foo::$vbtable@IA@:
 0  | 0
 1  | 32 (Food(IA+0)IDO)

Foo::$vbtable@D@:
 0  | -16
 1  | 8 (Food(DO+16)IDO)

Foo::$vftable@:
    | -32
 0  | &Foo::{dtor}
 1  | &DO::get_identifier
 2  | &IDO* Foo::clone
 3  | &D* Foo::clone
 4  | &Foo* Foo::clone
 5  | &Foo* Foo::clone

Foo::clone this adjustor: 32
Foo::{dtor} this adjustor: 32
Foo::__delDtor this adjustor: 32
Foo::__vecDelDtor this adjustor: 32
vbi:       class  offset o.vbptr  o.vbte fVtorDisp
             IDO      32       0       4 0

Here it is without:

class Foo   size(24):
    +---
 0  | +--- (base class IA)
 0  | | {vbptr}
    | +---
 8  | +--- (base class D)
 8  | | +--- (base class DO)
 8  | | | {vbptr}
    | | +---
    | +---
    +---
    +--- (virtual base IDO)
16  | {vfptr}
    +---

Foo::$vbtable@IA@:
 0  | 0
 1  | 16 (Food(IA+0)IDO)

Foo::$vbtable@D@:
 0  | 0
 1  | 8 (Food(DO+0)IDO)

Foo::$vftable@:
    | -16
 0  | &Foo::{dtor}
 1  | &DO::get_identifier
 2  | &IDO* Foo::clone
 3  | &D* Foo::clone
 4  | &Foo* Foo::clone
 5  | &Foo* Foo::clone

Foo::clone this adjustor: 16
Foo::{dtor} this adjustor: 16
Foo::__delDtor this adjustor: 16
Foo::__vecDelDtor this adjustor: 16
vbi:       class  offset o.vbptr  o.vbte fVtorDisp
             IDO      16       0       4 0

Here's some cleaned-up disassembly of the return-adjusting clone shim:

      mov         rcx,qword ptr [this]  
      call        Foo::clone ; the real clone  
      cmp         rax,0 ; null pointer remains null pointer
      je          fin
      add         rax,8 ; otherwise, add the offset to the D*
      jmp         fin
fin:  ret

Here's some cleanup-up disassembly of the faulting call:

mov         rax,qword ptr [foo]  
mov         rcx,rax  
mov         rax,qword ptr [rax] ; load vbptr  
movsxd      rax,dword ptr [rax+4] ; load offset to IDO subobject 
add         rcx,rax  ; add offset to Foo* to get IDO*
mov         rax,qword ptr [rcx]  ; load vtbl
call        qword ptr [rax+24]  ; call function at position 3 (D* clone)

And here's some cleaned-up disassembly of the crashing call:

mov         rax,qword ptr [foo_clone]  
mov         rcx,rax  
mov         rax,qword ptr [rax] ; load vbptr, loads null in the crashing case
movsxd      rax,dword ptr [rax+4] ; load offset to IDO subobject, crashes
add         rcx,rax  ; add offset to Foo* to get IDO*
mov         rax,qword ptr [rcx]  ; load vtbl
call        qword ptr [rax+8]  ; call function at position 1 (get_identifier)
like image 136
Sebastian Redl Avatar answered Sep 23 '22 17:09

Sebastian Redl