I am learning c++ and am learning about the virtual keyword. I have scoured the internet trying to understand it to no avail. I went into my editor and did the following experiment, expecting it to print out the base message twice (because I was under the impression that the virtual keyword is needed to override functions). However, it printed out two different messages. Can someone explain to me why we need the virtual keyword if we can simply override functions and still seemingly get polymorphic behavior? Perhaps someone can help me and other people in the future understand virtual vs. overriding. (The output I am getting is "I am the base" followed by "I am the derived").
#include <iostream>
using namespace std;
class Base{
public:
void printMe(){
cout << "I am the base" << endl;
}
};
class Derived: public Base{
public:
void printMe(){
cout << "I am the derived" << endl;
}
};
int main() {
Base a;
Derived b;
a.printMe();
b.printMe();
return 0;
}
The virtual keyword can be used when declaring overriding functions in a derived class, but it is unnecessary; overrides of virtual functions are always virtual. Virtual functions in a base class must be defined unless they are declared using the pure-specifier.
To override a method, the override keyword is mandatory not virtual . The difference is that you hide the method if you omit the virtual keyword, and not override it.
You cannot override a non-virtual or static method. The overridden base method must be virtual , abstract , or override . An override declaration cannot change the accessibility of the virtual method. Both the override method and the virtual method must have the same access level modifier.
When you override a function you don't technically need to write either virtual or override . The original base class declaration needs the keyword virtual to mark it as virtual. In the derived class the function is virtual by way of having the ¹same type as the base class function.
virtual
means, "This is NOT REALLY a C function, i.e a series of pushes of arguments onto the stack, followed by a jump to a SINGLE unchanging address of the function body."
Instead, it's this other beast that looks in a table at runtime for the address of the function body to execute. Each class in the hierarchy has an entry in that table. The table of function pointers is called a vtable. This is a RUNTIME mechanism for polymorphism that injects extra code to do this lookup and then dispatch to the appropriate specialized version of the function body.
Furthermore, when using this vtable dispatch mechanism, you always access your object through a POINTER to the object, as opposed to direct access (variable or reference) to it, ie. Foo* foo{makeFoo()}; foo->someMethod()
vs. Loo loo{}; loo.someMethod()
. So another dereference right from the get go is required to use this technique.
Here's the neat part: these pointers can point to any objects of derived classes as well, so if you have a class FooChild
that inherits from FooParent
, you can use a FoodParent *
to point to a FooParent
OR a FooChild
.
When the call is made to the method, instead of just doing the normal C thing of preparing the arguments on the stack, then jumping to the body of barMethod()
, it does a bunch of runtime work first to look up one of SEVERAL DIFFERENT implementations of barMethod that are individualized per class. That table is called the vtable. Each class in the class hierarchy has an entry in this table that says where the function body REALLY is for that particular class, since they can have different ones, EVEN IF we are using FooParent *
to point to instances of any of them.
But here's why we would want to do that in the first place: suppose virtual
does not exist. And you, the programmer, want to handle a bunch of objects that come from a class hierarchy. Well, you'd end up pretty much coding the same thing that the compiler injects for you by hand! In order to pass in your instances of these various classes into some function that you write to do stuff with them, you need a singularly sized type for the function call code to work. So, use pointers because pointers are always the same size on your machine (these days), no matter how differently sized the objects they point to are. Okay. So pointers it is. That's a sort of type erasure that is required to use virtual
.
Then you need a switch
statement or something to branch on the particular class it turns out to point to. But that'd be if you coded it by hand for each variation you wrote. That's silly. So quickly you'd realize you'd be better off with a table of pointers to your various versions of barMethod()
to call. Then you could always just look up that same table from every variation, instead of rewriting handcoded switch statements and such. So you'd do that. You'd implement a table in which you have pointers to different barMethod()
s for each of the classes in the hierarchy deriving from FooParent
. They'd all have the SAME SIGNATURE (parameter list, return value, etc), but DIFFERENT BODIES, for each class.
You'd assign each class an integer i.d. or something like that and use that as the offset into the table. Maybe FooChildA
and FooChildB
are two different classes that both derive from FooParent
for example, so you'd assign A to 0 and B to 1, or something like that. Then use those as offsets to jump into the table and get your pointer. That's how look up tables work in general. Once you got your pointer, you'd push all the arguments onto the stack, and then jump to that pointer. So virtual
is just a keyword that instructs the compiler to inject all this crazy high-level code into your code for you so you don't have to manually do it.
The problem is, it's RUNTIME polymorphism, when usually COMPILE time polymorphism can be used instead, via templates etc. It adds a lot of runtime bloat to every single function call in the virtual hierarchy. That's actually just fine for non-hot loops. But for things that run all the time in your system (like every few milliseconds or more) that's really an unacceptable amount of bloat. For the vast majority of cases, you could do the equivalent of all that table lookup stuff at compile time instead using metaprogramming so that runtime can be blazingly fast.
As for override
, that confusing mess should have been in the language from the get-go and should be in the same textual position as the virtual
keyword. Sadly, both of those "shoulds" were not done. So in the old days, you'd declare barMethod()
in the most parent of the class hierarchy as virtual
, and then also declare barMethod()
in the derived classes as virtual
. At some point this got to be super annoying due to weird bugs. The feature honestly isn't intuitive and is hard to teach or even remember after YEARS of knowing about it.
So we added override
as well as a hint to the compiler so we can catch bugs. It just means "not only is this function virtual, so do all that crazy vtable dispatching stuff, but in addition, this is a DERIVED re-definition of barMethod()
, so the compiler can check to make sure you matched the parameters etc perfectly with the parent class from which it was derived, because without this check, if you accidentally failed to match the derived version's parameter list exactly with the parent's version, instead of overriding the parent version, the compiler would just say, "Oh, another totally new virtual member function hierarchy is starting, with different parameters, and this is the root. Must be a new overload set."
I realize that's a super confusing statement. But basically, if you have barMethod()
and barMethod(int)
and barMethod(int, char*)
and so forth, these are all DIFFERENT functions with no real relationship to each other. It's as if each had a different name. You can think of it that way in your head. It's essentially how the compiler itself thinks of it, with name mangling. So if you then made them virtual
, you might think that declaring them in various classes in the hierarchy would put them into a single member function virtual hierarchy as well. But it doesn't. If you make them virtual using override
keyword instead, the compiler would notice that barMethod(int) override
and barMethod(int, char*) override
have no relationship to anything in FooParent
, which only has barMethod()
with no parameters. But they are supposedly overriding something. ¡COMPILER ERROR! And that's good. You want that compiler error, or else you code goes out to customers and looks like it's working but absolutely isn't.
The point of virtual
is to allow you to use a SINGLE POINTER TYPE to represent any instances of an entire hierarchy of classes, but do different things for each of them, potentially. That wouldn't happen if the programmer didn't make sure ALL of the derived redefinitions are also virtual. And override makes sure they aren't accidentally creating new class hierachy roots.
In modern C++, we have decided it was too annoying to require both virtual
and override
, and that it always made it harder to visually grep which barMethod()
s were the root version, and which ones were derived. And so they said, "you can drop the virtual
keyword for the derived redefinitions and JUST use override
." This is considered the only proper way to speak nowadays.
struct FooParent
{
// The root has virtual
virtual void barMethod(){ /* body */ } // or `=0` for "pure virtual"
}
// Original way of doing it. Just use virtual again, but this isn't the root now. This is a derived class.
struct FooChild_OldSchool : FooParent
{
virtual void barMethod(); // Total trashmouth. Bug prone.
}
struct FooChild_OverrideDays : FooParent
{
virtual void barMethod() override; // Naughty mouth. Using both.
}
struct FooChild_NonTrashyWay2020 : FooParent
{
void barMethod() override; // Prim and proper mouth. Using only override in the derived class.
}
Bizarrely though, override
sits in a different location syntactically, AFTER the parameter list, instead of before it. As far as I can tell this is really illogical. I really wish that we would fix this and allow override
to go in the same place virtual
does, at the beginning of the declaration, or better yet, let virtual
go where override
does, after the parameter list. As it is now, it's annoyingly inconsistent and confusing, imo. And I say all that because I believe these things make it unteachable if we don't admit they are warts. Because when you are learning a new language, you really need a more fluent speaker to say, "hey this is weird and warty. Don't worry about it. It's not because you're dumb. It's just because our language is evolved and wonky."
I wish it was like this...
struct FooChild_HowIWishItWas : FooParent
{
override void barMethod();
}
// OR EVEN BETTER! Allow us to change the location of virtual!
struct FooParent_HowIWishItWasEvenMore
{
void barMethod() virtual;
}
But it isn't. That's maybe how you can think of it internally though, and then just remember to add this weird wonkiness syntactically when you're actually typing the code. Wonder whether a paper on this would survive 5 minutes. Hmm.
Consider the following example. The important line to illustrate the need for virtual
and override
is c->printMe();
. Note that the type of c
is Base*
, however due to polymorphism it is correctly able to call the overridden method from the derived class. The override
keyword allows the compiler to enforce that a derived class method matches the signature of a base class's method that is marked virtual
. If the override
keyword is added to a derived class function, that function does not also need the virtual
keyword in the derived class as the virtual is implied.
#include <iostream>
class Base{
public:
virtual void printMe(){
std::cout << "I am the base" << std::endl;
}
};
class Derived: public Base{
public:
void printMe() override {
std::cout << "I am the derived" << std::endl;
}
};
int main() {
Base a;
Derived b;
a.printMe();
b.printMe();
Base* c = &b;
c->printMe();
return 0;
}
The output is
I am the base
I am the derived
I am the derived
override
is a new keyword added in C++11.
You should use it because:
the compiler will check if a base class contains a matching virtual
method. This is important since some typo in the method name or in its list of arguments (overloads are allowed) can lead to the impression that something was overridden when it really was not.
if you use override
for one method, the compiler will report an error if another method is overridden without using the override
keyword. This helps detect unwanted overrides when symbol collisions happen.
virtual
doesn't mean "override". In class doent use "override" keyword than to override a method you can simply write this method omitting "virtual" keyword, override will happen implicitly. Developers were writing virtual
before C++11 to indicate their intention of override. Simply put virtual
means: this method can be overridden in a subclasses.
With the code you have, if you do this
Derived derived;
Base* base_ptr = &derived;
base_ptr->printMe();
What do you think happens? It will not print out I am the derived
because the method is not virtual, and the dispatch is done off the static type of the calling object (i.e. Base
). If you change it to virtual the method that is called will depend on the dynamic type of the object and not the static type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With