Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any rules of thumb when `virtual` is a considerable overhead?

My questions is basically completely stated in the title, however let me elaborate.

Question: Maybe worth of rephrasing, how complicated/simple the virtual method has to be, to make the mechanism a considerable overhead? Are there any rules of thumb for this? E.g. If it takes 10 minutes, uses I/O, complex if statements, memory operations etc. it's not a problem. Or, if you write virtual get_r() { return sqrt( x*x + y*y); }; and call it in a loop you will have troubles.

I hope the question is not too general as I seek some general but concrete technical answers. Either its hard/impossible to tell, or virtual calls take so much time/cycles resources, and math takes this, I/O this.

Maybe some technical people know some general numbers to compare or did some analysis and can share general conclusions. Embarassingly I dunno how to make those fancy asm analysis =/.

I would also like to give some rationale behind it, as well as my use-case.

I think I saw more than few questions with people refraining from using virtuals like open fire in the forest during drought, for the sake of performance, and as many individuals asking them "Are you absolutely sure that virtual overhead is really an issue in your case?".

In my recent work I ran into a problem which can be placed at both sides of the river, I believe.

Also bear in mind I do not ask how to improve implementation of interface. I believe I know how to do it. I'm asking if it's possible to tell when to do it, or which to choose right of the bat.

Use-case:

I run some simulations. I have a class which basically provides a run environment. There is a base class, and more than one derived class that define some different workflows. Base collects stuff as common logic and assigning I/O sources and sinks. Derivatives define particular workflows, more or less by implementing RunEnv::run(). I think this is a valid design. Now let's imagine objects that are subjects of the workflow can be put in 2D or 3D plane. The workflows are common/interchangeable in both cases, so the objects we are working on can have common interface, although to very simple methods like Object::get_r(). On top of that lets have some stat logger defined for the environment.

Originally I wanted to provide some code snippets but it ended up with 5 classes and 2-4 methods each i.e. wall of code. I can post it on request but it would lengthen the question to the twice of current size.

Key points are: RunEnv::run() is the main loop. Usually very long (5mins-5h). It provides basic time instrumentation, calls RunEnv::process_iteration() and RunEnv::log_stats(). All are virtual. Rationale is. I can derive the RunEnv, redesign the run() for example for different stop conditions. I can redesign process_iteration(), for example to use multi-threading if I have to process a pool of objects, process them in various ways. Also different workflows will want to log different stats. RunEnv::log_stats() is just a call that outputs already computed interesting stats into a std::ostream. I guess using virtuals and has no real impact.

Now let's say the iteration works by calculating distance of objects to the origin. So we have as interface double Obj::get_r();. Obj are implementation for 2D and 3D cases. The getter is in both cases a simple math with 2-3 multiplications and additions.

I also experimented in different memory handling. E.g. sometimes coordinate data was stored in private variables and sometimes in shared pool, so even the get_x() could be made virtual with implementation get_x(){return x;}; or get_x(){ return pool[my_num*dim+x_offset]; };. Imagine calculating something with get_r(){ sqrt(get_x()*get_x() + get_y()*get_y()) ;};. I suspect virtuality here would kill performance.

like image 770
luk32 Avatar asked Jul 08 '13 17:07

luk32


2 Answers

The virtual method call in C++ on an x86 yield the code similar to (single inheritance):

    mov ecx,[esp+4]
    mov eax,[ecx]       // pointer to vtable
    jmp [eax]           

Without virtual you will spare one mov instruction compared to a non-virtual member function. So, in case of single inheritance the performance hit is negligible.

In case if you have multiple inheritance or, worse, virtual inheritance the virtual calls can be much much more complex. But this is more problem of classes hierarchy and architecture.

The rule of thumb:

If the body of the method is many times (>100x) slower than a single mov instruction - just use virtual and don't bother. Otherwise - profile your bottlenecks and optimize.

Update:

For multiple/virtual inheritance cases check out this page: http://www.lrdev.com/lr/c/virtual.html

like image 170
Sergey K. Avatar answered Oct 22 '22 02:10

Sergey K.


Are there any rules of thumb for this?

The best, most general rule of thumb for questions like this one is:

measure your code before optimizing

Trying to make your code perform well without measuring is a sure path to unnecessarily complex code that's optimized in all the wrong places.

So, don't worry about the overhead of a virtual function until you have some solid evidence that the virtual is the problem. If you do have such evidence, then you can work to remove the virtual in that case. More likely, though, you'll find that finding ways to speed up your calculations, or to avoid calculating where you don't need to, will yield much larger performance improvements. But again, don't just guess -- measure first.

like image 20
Caleb Avatar answered Oct 22 '22 01:10

Caleb