devirtualize: to change a virtual/polymorphic/indirect function call into a static function call due to some guarantee that the change is correct -- source: myself
Given a simple trait object, &dyn ToString
, created with a statically known type, String
:
fn main() {
let name: &dyn ToString = &String::from("Steve");
println!("{}", name.to_string());
}
Does the call to .to_string()
use <String as ToString>::to_string()
directly? Or only indirectly via the trait's vtable? If indirectly, would it be possible to devirtualize this call? Or is there something fundamental that hinders this optimization?
The motivating code for this question is much more complicated; it uses async trait functions and I'm wondering if returning a Box<dyn Future>
can be optimized in some cases.
Does Rust devirtualize trait object function calls?
No.
Rust is a language, it doesn't do anything; it only prescribes semantics.
In this specific case, the Rust language doesn't prescribe devirtualization, so an implementation is permitted to do it.
At the moment, the only stable implementation is rustc, with the LLVM backend -- though you can use the cranelift backend if you feel adventurous.
You can test your code for this implementation on the playground and select "Show LLVM IR" instead of "Run", as well as "Release" instead of "Debug", you should be able to check that there is no virtual call.
A revised version of the code isolates the cast to trait + dynamic call to make it easier:
#[inline(never)]
fn to_string(s: &String) -> String {
let name: &dyn ToString = s;
name.to_string()
}
fn main() {
let name = String::from("Steve");
let name = to_string(&name);
println!("{}", name);
}
Which when run on the playground yields among other things:
; playground::to_string
; Function Attrs: noinline nonlazybind uwtable
define internal fastcc void @_ZN10playground9to_string17h4a25abbd46fc29d4E(%"std::string::String"* noalias nocapture dereferenceable(24) %0, %"std::string::String"* noalias readonly align 8 dereferenceable(24) %s) unnamed_addr #0 {
start:
; call <alloc::string::String as core::clone::Clone>::clone
tail call void @"_ZN60_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$5clone17h1e3037d7443348baE"(%"std::string::String"* noalias nocapture nonnull sret dereferenceable(24) %0, %"std::string::String"* noalias nonnull readonly align 8 dereferenceable(24) %s)
ret void
}
Where you can clearly see that the call to ToString::to_string
has been replaced by a simple call to <String as Clone>::clone
; a devirtualized call.
The motivating code for this question is much more complicated; it uses async trait functions and I'm wondering if returning a
Box<dyn Future>
can be optimized in some cases.
Unfortunately, you cannot draw any conclusion from the above example.
Optimizations are finicky. In essence, most optimizations are akin to pattern-matching+replacing using regexes: differences that to human look benign may completely throw off the pattern-matching and prevent the optimization to apply.
The only way to be certain that the optimization is applied in your case, if it matters, is to inspect the emitted assembly.
But, really, in this case, I'd be more worried about the memory allocation than about the virtual call. A virtual call is about 5ns of overhead -- though it does inhibit a number of optimization -- whereas a memory allocation (and the eventual deallocation) routinely cost 20ns - 30ns.
Does the call to
.to_string()
use<String as ToString>::to_string()
directly? Or only indirectly via the trait's vtable?
We can test this case by writing two functions, one that uses dyn ToString
, and one that uses the concrete type String
directly:
pub fn dyn_to_string() {
let name: &dyn ToString = &String::from("Steve");
println!("{}", name.to_string());
}
pub fn concrete_to_string() {
let name: &String = &String::from("Steve");
println!("{}", name.to_string());
}
And now we can view the generated assembly:
playground::dyn_to_string:
...
callq *<alloc::string::String as core::clone::Clone>::clone@GOTPCREL(%rip)
movq %rbx, 24(%rsp)
leaq <alloc::string::String as core::fmt::Display>::fmt(%rip), %rax
As you can see dyn_to_string
is optimized to use <String as Clone>::clone
directly instead of indirectly through a vtable - it was devirtualized. In fact, the concrete implementation is exactly the same as the trait object call:
set playground::concrete_to_string, playground::dyn_to_string
However, to answer the broader question:
Does Rust devirtualize trait object function calls?
It depends. The compiler cannot always perform devirtualization. It did in the above code, but in other cases, it might not. You should not expect that a trait object call will be devirtualized. Generics are a guaranteed zero cost abstraction. Trait objects are not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With