Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the actual runtime performance costs of dynamic dispatch?

Tags:

There's some background on this topic in the Rust book section on static and dynamic dispatch, but the tl;dr is that calling a method on a trait reference and a few other various situation (function pointers, etc) results in dynamic instead of static dispatch.

What is actual runtime cost of this, after optimizations have been applied?

For example, imagine this set of structs & traits:

struct Buffer;
struct TmpBuffer;
struct TmpMutBuffer;

impl BufferType for Buffer { ... }
impl BufferType for BufferTmp { ... }
impl BufferType for BufferTmpMut { ... }

impl Buffer2D for BufferType { ... }

impl Buffer2DExt for Buffer2D { ... }

Notice that the traits here are implemented on traits themselves.

What is the calling cost of dynamic dispatch to invoke a method from Buffer2DExt on a struct reference?

The recent question What are Rust's exact auto-dereferencing rules? regards the dereferencing rules; are these rules applied at compile time, or runtime?

like image 268
Doug Avatar asked Feb 20 '15 05:02

Doug


People also ask

What is dynamic dispatch in IOS Swift?

Dynamic dispatch. It simply means that the Objective-C runtime decides at runtime which implementation of a particular method or function it needs to invoke.

What is dynamic dispatch rust?

Dynamic dispatch. Rust provides dynamic dispatch through a feature called 'trait objects'. Trait objects, like &Foo or Box<Foo> , are normal values that store a value of any type that implements the given trait, where the precise type can only be known at runtime.

What does the features dynamic dispatch mean?

In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriented programming (OOP) languages and systems.


1 Answers

Disclaimer: the question is fairly open-ended, therefore this answer might be incomplete. Treat it with an even bigger grain of salt than you would normally.

Rust uses a simple "virtual table" to implement dynamic dispatch. This strategy is also used in C++ for which you can see a study here. The study is a bit dated though.

The cost of indirection

Virtual dispatch induces indirection, this has a cost for multiple reasons:

  • indirection is opaque: this inhibits inlining and constant propagation which are key enablers for many compiler optimizations
  • indirection has a runtime cost: if incorrectly predicted, you are looking at pipeline stalls and expensive memory fetches

Optimizing indirection

Compilers, however, muddies the water by trying their best at optimizing indirection away.

  • devirtualization: sometimes the compiler can resolve the virtual table look-up at compile time (usually, because it knows the concrete type of the object); if so, it can therefore use a regular function call rather than an indirect one, and optimize away the indirection
  • probabilistic devirtualization: last year Honza Hubička introduced a new optimization in gcc (read the 5-part series as it is very instructive). The gist of the strategy is to build the inheritance graph to make an educated guess at the potential type(s), and then use a pattern like if v.hasType(A) { v.A::call() } elif v.hasType(B) { v.B::call() } else { v.virtual-call() }; special-casing the most likely types means regular calls in this case, and therefore inlined/constant-propagated/full-goodies calls.

This latter strategy could be fairly interesting in Rust due to the coherence rules and privacy rules, as it should have more cases where the full "inheritance" graph is provably known.

The cost of monomorphization

In Rust, you can use compile-time polymorphism instead of run-time polymorphism; the compiler will emit one version of the function for each unique combination of compile-time parameters it takes. This, itself, has a cost:

  • compilation time cost: more code to be produced, more code to be optimized
  • binary size cost: the produced binary will end-up being larger, a typical size/speed trade-off
  • run-time cost: possibly, the larger code size might lead to cache misses at CPU level

The compiler might be able to merge together specialized functions that end up having the same implementation (because of phantom types, for example), however it is still more than likely than the produced binaries (executables and libraries) will end up larger.

As usual with performance, you have to measure in your case what is more beneficial.

like image 75
Matthieu M. Avatar answered Sep 30 '22 14:09

Matthieu M.