I have a small struct:
pub struct Foo {
pub a: i32,
pub b: i32,
pub c: i32,
}
I was using pairs of the fields in the form (a,b) (b,c) (c,a)
. To avoid duplication of the code, I created a utility function which would allow me to iterate over the pairs:
fn get_foo_ref(&self) -> [(&i32, &i32); 3] {
[(&self.a, &self.b), (&self.b, &self.c), (&self.c, &self.a)]
}
I had to decide if I should return the values as references or copy the i32
. Later on, I plan to switch to a non-Copy
type instead of an i32
, so I decided to use references. I expected the resulting code should be equivalent since everything would be inlined.
I am generally optimistic about optimizations, so I suspected that the code would be equivalent when using this function as compared to hand written code examples.
First the variant using the function:
pub fn testing_ref(f: Foo) -> i32 {
let mut sum = 0;
for i in 0..3 {
let (l, r) = f.get_foo_ref()[i];
sum += *l + *r;
}
sum
}
Then the hand-written variant:
pub fn testing_direct(f: Foo) -> i32 {
let mut sum = 0;
sum += f.a + f.b;
sum += f.b + f.c;
sum += f.c + f.a;
sum
}
To my disappointment, all 3 methods resulted in different assembly code. The worst code was generated for the case with references, and the best code was the one that didn't use my utility function at all. Why is that? Shouldn't the compiler generate equivalent code in this case?
You can view the resulting assembly code on Godbolt; I also have the 'equivalent' assembly code from C++.
In C++, the compiler generated equivalent code between get_foo
and get_foo_ref
, although I don't understand why the code for all 3 cases is not equivalent.
Why did the compiler did not generate equivalent code for all 3 cases?
Update:
I've modified slightly code to use arrays and to add one more direct case.
Rust version with f64 and arrays
C++ version with f64 and arrays
This time the generated code between in C++ is exactly the same. However the Rust' assembly differs, and returning by references results in worse assembly.
Well, I guess this is another example that nothing can be taken for granted.
Pass-by-references is more efficient than pass-by-value, because it does not copy the arguments. The formal parameter is an alias for the argument. When the called function read or write the formal parameter, it is actually read or write the argument itself.
Functions can be declared to return a reference type. There are two reasons to make such a declaration: The information being returned is a large enough object that returning a reference is more efficient than returning a copy.
The major difference is that the pointers can be operated on like adding values whereas references are just an alias for another variable. Functions in C++ can return a reference as it's returns a pointer. When function returns a reference it means it returns a implicit pointer.
At a low level a parameter pass by reference is implemented using a pointer whereas primitive return values are typically passed literally in registers. So return values are likely to perform better.
TL;DR: Microbenchmarks are trickery, instruction count does not directly translate into high/low performance.
Later on, I plan to switch to a non-Copy type instead of an i32, so I decided to use references.
Then, you should check the generated assembly for your new type.
In your optimized example, the compiler is being very crafty:
pub fn testing_direct(f: Foo) -> i32 { let mut sum = 0; sum += f.a + f.b; sum += f.b + f.c; sum += f.c + f.a; sum }
Yields:
example::testing_direct: push rbp mov rbp, rsp mov eax, dword ptr [rdi + 4] add eax, dword ptr [rdi] add eax, dword ptr [rdi + 8] add eax, eax pop rbp ret
Which is roughly sum += f.a; sum += f.b; sum += f.c; sum += sum;
.
That is, the compiler realized that:
f.X
was added twicef.X * 2
was equivalent to adding it twiceWhile the former may be inhibited in the other cases by the use of indirection, the latter is VERY specific to i32
(and addition being commutative).
For example, switching your code to f32
(still Copy
, but addition is not longer commutative), I get the very same assembly for both testing_direct
and testing
(and slightly different for testing_ref
):
example::testing: push rbp mov rbp, rsp movss xmm1, dword ptr [rdi] movss xmm2, dword ptr [rdi + 4] movss xmm0, dword ptr [rdi + 8] movaps xmm3, xmm1 addss xmm3, xmm2 xorps xmm4, xmm4 addss xmm4, xmm3 addss xmm2, xmm0 addss xmm2, xmm4 addss xmm0, xmm1 addss xmm0, xmm2 pop rbp ret
And there's no trickery any longer.
So it's really not possible to infer much from your example, check with the real type.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With