Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the produced assembly not equivalent between returning by reference and copy when inlined?

Tags:

rust

I have a small struct:

pub struct Foo {
    pub a: i32,
    pub b: i32,
    pub c: i32,
}

I was using pairs of the fields in the form (a,b) (b,c) (c,a). To avoid duplication of the code, I created a utility function which would allow me to iterate over the pairs:

fn get_foo_ref(&self) -> [(&i32, &i32); 3] {
    [(&self.a, &self.b), (&self.b, &self.c), (&self.c, &self.a)]
}

I had to decide if I should return the values as references or copy the i32. Later on, I plan to switch to a non-Copy type instead of an i32, so I decided to use references. I expected the resulting code should be equivalent since everything would be inlined.

I am generally optimistic about optimizations, so I suspected that the code would be equivalent when using this function as compared to hand written code examples.

First the variant using the function:

pub fn testing_ref(f: Foo) -> i32 {
    let mut sum = 0;

    for i in 0..3 {
        let (l, r) = f.get_foo_ref()[i];

        sum += *l + *r;
    }

    sum
}

Then the hand-written variant:

pub fn testing_direct(f: Foo) -> i32 {
    let mut sum = 0;

    sum += f.a + f.b;
    sum += f.b + f.c;
    sum += f.c + f.a;

    sum
}

To my disappointment, all 3 methods resulted in different assembly code. The worst code was generated for the case with references, and the best code was the one that didn't use my utility function at all. Why is that? Shouldn't the compiler generate equivalent code in this case?

You can view the resulting assembly code on Godbolt; I also have the 'equivalent' assembly code from C++.

In C++, the compiler generated equivalent code between get_foo and get_foo_ref, although I don't understand why the code for all 3 cases is not equivalent.

Why did the compiler did not generate equivalent code for all 3 cases?

Update:

I've modified slightly code to use arrays and to add one more direct case.
Rust version with f64 and arrays
C++ version with f64 and arrays
This time the generated code between in C++ is exactly the same. However the Rust' assembly differs, and returning by references results in worse assembly.

Well, I guess this is another example that nothing can be taken for granted.

like image 475
Aleksander Fular Avatar asked Feb 06 '17 19:02

Aleksander Fular


People also ask

Is passing by reference more efficient?

Pass-by-references is more efficient than pass-by-value, because it does not copy the arguments. The formal parameter is an alias for the argument. When the called function read or write the formal parameter, it is actually read or write the argument itself.

What is the advantage of returning a reference from the function?

Functions can be declared to return a reference type. There are two reasons to make such a declaration: The information being returned is a large enough object that returning a reference is more efficient than returning a copy.

How the returning by reference is performed?

The major difference is that the pointers can be operated on like adding values whereas references are just an alias for another variable. Functions in C++ can return a reference as it's returns a pointer. When function returns a reference it means it returns a implicit pointer.

Which is generally more efficient a function that returns an object by reference or a function that returns an object by value?

At a low level a parameter pass by reference is implemented using a pointer whereas primitive return values are typically passed literally in registers. So return values are likely to perform better.


1 Answers

TL;DR: Microbenchmarks are trickery, instruction count does not directly translate into high/low performance.


Later on, I plan to switch to a non-Copy type instead of an i32, so I decided to use references.

Then, you should check the generated assembly for your new type.

In your optimized example, the compiler is being very crafty:

pub fn testing_direct(f: Foo) -> i32 {
    let mut sum = 0;

    sum += f.a + f.b;
    sum += f.b + f.c;
    sum += f.c + f.a;

    sum
}

Yields:

example::testing_direct:
        push    rbp
        mov     rbp, rsp
        mov     eax, dword ptr [rdi + 4]
        add     eax, dword ptr [rdi]
        add     eax, dword ptr [rdi + 8]
        add     eax, eax
        pop     rbp
        ret

Which is roughly sum += f.a; sum += f.b; sum += f.c; sum += sum;.

That is, the compiler realized that:

  1. f.X was added twice
  2. f.X * 2 was equivalent to adding it twice

While the former may be inhibited in the other cases by the use of indirection, the latter is VERY specific to i32 (and addition being commutative).

For example, switching your code to f32 (still Copy, but addition is not longer commutative), I get the very same assembly for both testing_direct and testing (and slightly different for testing_ref):

example::testing:
        push    rbp
        mov     rbp, rsp
        movss   xmm1, dword ptr [rdi]
        movss   xmm2, dword ptr [rdi + 4]
        movss   xmm0, dword ptr [rdi + 8]
        movaps  xmm3, xmm1
        addss   xmm3, xmm2
        xorps   xmm4, xmm4
        addss   xmm4, xmm3
        addss   xmm2, xmm0
        addss   xmm2, xmm4
        addss   xmm0, xmm1
        addss   xmm0, xmm2
        pop     rbp
        ret

And there's no trickery any longer.

So it's really not possible to infer much from your example, check with the real type.

like image 176
Matthieu M. Avatar answered Oct 25 '22 00:10

Matthieu M.