Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the dereference operator preserves polymorphism (late binding) in C++?

It is well-known that "Virtuals are resolved at run time only if the call is made through a reference or pointer.". Thus, it is surprising to me when I find the dereference operator also keeps the dynamic binding feature.

#include <iostream>
using namespace std;

struct B {
  virtual void say() { cout << "Hello B" << endl; }
};

struct D : B {
  void say() override { cout << "Hello D" << endl; }
};

int main() {
    D *ptr = new D();
    B *p = ptr;
    (*p).say();
    return 0;
}

The output is

Hello D

Question: What the compiler dealt with the dereference operator *?

I thought it is done in the compile time. Thus when the compiler deference the pointer p, it should assumes that p points to a object of type B. For example, the following code

D temp = (*p);

complains

error: no viable conversion from 'B' to 'D'
like image 896
Peng Zhang Avatar asked May 19 '14 22:05

Peng Zhang


People also ask

What is purpose of dereferencing operator in C?

In computer programming, a dereference operator, also known as an indirection operator, operates on a pointer variable. It returns the location value, or l-value in memory pointed to by the variable's value. In the C programming language, the deference operator is denoted with an asterisk (*).

What happens in Dereferencing?

Dereferencing is used to access or manipulate data contained in memory location pointed to by a pointer. *(asterisk) is used with pointer variable when dereferencing the pointer variable, it refers to variable being pointed, so this is called dereferencing of pointers.


3 Answers

On the surface of it, this is an interesting question, because absent an overload of unary *, dereferencing results in an lvalue B, not a reference type. However, even starting to go down this line of reasoning is a red herring: expressions never have reference types, as the reference is immediately dropped and determines the value category. In that sense, the unary * operator is very much like a function returning a reference

In fact, the answer is that your initial assertion is incorrect: dynamic dispatch does not at all rely on references or pointers. It is references and pointers that enable you to prevent slicing, but once you have some expression referring to your polymorphic object, any old function call will do.

Also consider:

#include <iostream>

struct Base
{
   virtual void foo() { std::cout << "Base::foo()\n"; }
   void bar() { foo(); }
};

struct Derived : Base
{
   virtual void foo() { std::cout << "Derived::foo()\n"; }
};

int main()
{
   Derived d;
   d.bar();    // output: "Derived::foo()"
}

(live demo)

like image 97
Lightness Races in Orbit Avatar answered Oct 23 '22 00:10

Lightness Races in Orbit


The derefencing/indirection operator * doesn't itself actually do anything. For example, when you write just *p; the compiler may ignore this line if p is just a pointer.

What the * does is change the semantics of read and write:

int  i = 42;
int* p = &i;

*p = 0;
 p = 0;

The *p = 0 means write to the object p points to. Note that in C++, an object is a region of storage.

Similarly,

auto x =  p; // copies the address
auto y = *p; // copies the value

Here, the read from *p means read the value of the object p points to.

The value category of *p only determines which operations the C++ language allows on expressions of the form *p.

References are really just pointers with syntactic sugar. So trying to explain what *p does by using references is circular reasoning.


Let's consider slightly changed classes:

class Base
{
private:
    int b = 21;
public:
    virtual void say() { std::cout << "Hello B(" <<b<< ")\n"; }
};

class Derived : public Base
{
private:
    int d = 1729;
public:
    virtual void say() { std::cout << "Hello D(" <<d<< ")\n"; }
};


Derived d;
Derived *pd = &d;
Base* pb = pd;

One weird, but I think allowed memory layout looks like this:

$$2d graphics mode$$

        +-Derived------------+
        |    +-Base---+----+ |
        | d  | vtable | b  | |
        |    +--------+----+ |
        +----^---------------+
        ^    | pb
        | pd


$$1d graphics mode$$

name    #    /../   |d       |vtable          |b       |
address #   /../    |0 1 2 3 |4 5 6 7 8 9 1011|12131415|16
                     ^        ^
                     | pd     | pb

pd == some address
pb == pd + 4 byte

When we convert from Derived* to Base*, the compiler knows the offset of the Base subobject inside a Derived object, and can compute the address value for this subobject.

The vtable pointer is stored, for single nonvirtual inheritance, in the least derived type that has a virtual function. It is changed by derived classes roughly as seen in this implemenation/simulation.

When we now call

pb->say()

which is defined in the C++ Standard as

(*pb).say()

the compiler knows from the type of pb (which is Base*), that we call a virtual function. Therefore, the (*pb).say() means look up the entry for say in the vtable of the object pb points to, and call it. The part of the object pb points to is what allows polymorphism.

On the other hand, when we copy

Base b = *pb;

What happens is that the vtable pointer is not copied. This would be dangerous, because Derived::say might try to access Derived::d. But this data member isn't available in an object of type Base, which we're currently creating (in the copy ctor of Base).

like image 25
dyp Avatar answered Oct 23 '22 00:10

dyp


After doing some research, I think I have a reasonable (at least to me) answer for this question to share.

Assumptions (excerpted or paraphrased from the book "C++ Primer 5th"):

  1. The dereference operator * on a pointer p, i.e. (*p) returns the object to which p points.
  2. A object of a derived class D: public B logically has two parts, one is a sub-object of class B and the other part has members of class D. (This explains the "slicing

The virtual mechanism of C++ I used to support this answer is from an article 12.5 The Virtual Table. It convinces me at least. Below is a figure conceptually shows the *__vptr and the VTables of the code in our question. enter image description here

My explanation.

D obj_d;
D* ptr = &obj_d; // ptr is a pointer to type D,
                 // and points to obj_d, an object of type D

B* p = ptr;      // p is a pointer to type B and p points to the B subobject of obj_d.

(*p).say();

Since p is a pointer to type B, (*p) returns an object of type B, i.e., the sub-object of (*ptr). Name this object of type B as obj_b.

However, the *__vptr of obj_b points to the VTable of D. Thus, when it calls say(), the function pointer of say() in the VTable of D points to the method that prints "Hello D"

Experiment which supports my explanation.

 (&(*p))->say();  // outputs "Hello D"

Further Note

During calling a method of an object x, whether the polymorphism (dynamic binding of class members) happens depends on which VTable the *__vptr of that object it points to.

If we write B obj_x(*p); (&obj_x)->say(); the output is "Hello B". This is because obj_x is a completely newly constructed object of type B using the synthesized copy constructor of struct B. Thus, the *__vptr of obj_x points to the VTable of B.

Thanks to the help from dyp, we have a simulation of the virtual dispatch of this question. In case of the webpage is removed by Coiliru, I stored the code here.

like image 36
Peng Zhang Avatar answered Oct 22 '22 23:10

Peng Zhang