Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does move always copy the data?

Let's say we have a function that passes a relatively large piece of stack-only data to another function, like this:

fn a() {
    let arr_a: [i32; 1024] = [1, 2, 3, ...];
    b(arr_a);
}

fn b(arr_b: [i32; 1024]) {
    // ... do stuff with arr_b here
}

In Rust terms, when b gets called, a's arr_a will be moved into b's arr_b. Under the hood, will the entire array always be copied on the stack, or is it possible the compiler will optimize that by simply using the data of arr_a as it is, at the memory address it is, without copying it? If the latter, which part of the compiler should be responsible for that? LLVM?

Note: I know we can guarantee the array's data doesn't get copied by using a reference/slice for example, but that's not what this question is about.

like image 672
at54321 Avatar asked May 31 '26 04:05

at54321


1 Answers

Let's modify your example to simplify it a little bit:

pub fn main() {
    let mut arr_a: [u8; 1024] = [42; 1024];
    let val = arr_a[123];
    println!("{}", val);
    b(arr_a);
}

#[inline(never)]
fn b(arr_b: [u8; 1024]) {
    let val = arr_b[123];
    println!("{}", val);
}

Compiler Explorer

This compiles into the following assembly code (assuming the amd64 architecture and -C opt-level=2):

example::main::hd2bfa2df25bfe7d7:
        push    rbx
        sub     rsp, 1104
        lea     rbx, [rsp + 80]
        mov     edx, 1024
        mov     rdi, rbx
        mov     esi, 42
        call    qword ptr [rip + memset@GOTPCREL]
        mov     byte ptr [rsp + 15], 42
        lea     rax, [rsp + 15]
        mov     qword ptr [rsp + 16], rax
        mov     rax, qword ptr [rip + core::fmt::num::imp::<impl core::fmt::Display for u8>::fmt::ha81407c30cb780ca@GOTPCREL]
        mov     qword ptr [rsp + 24], rax
        lea     rax, [rip + .L__unnamed_1]
        mov     qword ptr [rsp + 32], rax
        mov     qword ptr [rsp + 40], 2
        mov     qword ptr [rsp + 64], 0
        lea     rax, [rsp + 16]
        mov     qword ptr [rsp + 48], rax
        mov     qword ptr [rsp + 56], 1
        lea     rdi, [rsp + 32]
        call    qword ptr [rip + std::io::stdio::_print::hd6837e34a66547dd@GOTPCREL]
        mov     rdi, rbx
        call    example::b::hea8802b300eb5620
        add     rsp, 1104
        pop     rbx
        ret

example::b::hea8802b300eb5620:
        sub     rsp, 72
        movzx   eax, byte ptr [rdi + 123]
        mov     byte ptr [rsp + 7], al
        lea     rax, [rsp + 7]
        mov     qword ptr [rsp + 8], rax
        mov     rax, qword ptr [rip + core::fmt::num::imp::<impl core::fmt::Display for u8>::fmt::ha81407c30cb780ca@GOTPCREL]
        mov     qword ptr [rsp + 16], rax
        lea     rax, [rip + .L__unnamed_1]
        mov     qword ptr [rsp + 24], rax
        mov     qword ptr [rsp + 32], 2
        mov     qword ptr [rsp + 56], 0
        lea     rax, [rsp + 8]
        mov     qword ptr [rsp + 40], rax
        mov     qword ptr [rsp + 48], 1
        lea     rdi, [rsp + 24]
        call    qword ptr [rip + std::io::stdio::_print::hd6837e34a66547dd@GOTPCREL]
        add     rsp, 72
        ret

You can see several parts in the assembly code:

  1. Creating the array (up until the call memset line), which just initializes 1024 bytes to 42.
  2. Getting the 123rd element of the array and displaying it (up until call std::io::stdio::_print).
  3. Moving the array address to rdi (which is, by convention, used to pass the first argument of a function in the typical Linux amd64 ABI)
  4. Call to fn b()
  5. Steps 2-3 again, this time inside fn b()'s body

Note there is no copying involved here; the compiler is smart enough to see you don't need a copy of the value, so it just passes the pointer to the already existing array. However, keep in mind that certain operations, like printing out the memory address of the variable, or even passing it directly into println!() instead of making a local variable first may change this behavior.

As about the which part of the compiler should be responsible for that part of the question — you can see the optimization being done on the Rust MIR level (also can be seen in the Compiler Explorer), so it's being done by the Rust compiler, as opposed to LLVM:

Rust MIR:

fn main() -> () {
    let mut _0: ();
    let mut _1: [u8; 1024];
    // ...

    bb1: {
        StorageDead(_4);
        StorageDead(_6);
        _9 = b(move _1) -> [return: bb2, unwind continue];
    }
    
    // ...
}


fn b(_1: [u8; 1024]) -> () {
    debug arr_b => _1;
    let mut _0: ();
    let _2: u8;
    let _3: ();
    // ...

The _9 = b(move _1) -> [return: bb2, unwind continue]; part tells us that the compiler will just pass the pointer to the already existing array. The opposite would be b(copy _1), where we would have a memcpy. For comparison, the following code generates an array copy before the call to b(), because we're trying to see the memory addresses of the values:

pub fn main() {
    let mut arr_a: [u8; 1024] = [42; 1024];
    println!("{:p}", &arr_a);
    b(arr_a);
}

#[inline(never)]
fn b(arr_b: [u8; 1024]) {
    println!("{:p}", &arr_b);
}

Compiler Explorer

And indeed, we can see the following line in the Rust MIR code:

        _9 = b(copy _1) -> [return: bb2, unwind continue];

However, keep in mind that similar optimizations might be done on the LLVM IR level as well.


I've answered a similar question here, and my answer there includes a few details not being spoken about in this answer.

like image 131
m4tx Avatar answered Jun 02 '26 21:06

m4tx



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!