Let's say we have a function that passes a relatively large piece of stack-only data to another function, like this:
fn a() {
let arr_a: [i32; 1024] = [1, 2, 3, ...];
b(arr_a);
}
fn b(arr_b: [i32; 1024]) {
// ... do stuff with arr_b here
}
In Rust terms, when b gets called, a's arr_a will be moved into b's arr_b. Under the hood, will the entire array always be copied on the stack, or is it possible the compiler will optimize that by simply using the data of arr_a as it is, at the memory address it is, without copying it? If the latter, which part of the compiler should be responsible for that? LLVM?
Note: I know we can guarantee the array's data doesn't get copied by using a reference/slice for example, but that's not what this question is about.
Let's modify your example to simplify it a little bit:
pub fn main() {
let mut arr_a: [u8; 1024] = [42; 1024];
let val = arr_a[123];
println!("{}", val);
b(arr_a);
}
#[inline(never)]
fn b(arr_b: [u8; 1024]) {
let val = arr_b[123];
println!("{}", val);
}
Compiler Explorer
This compiles into the following assembly code (assuming the amd64 architecture and -C opt-level=2):
example::main::hd2bfa2df25bfe7d7:
push rbx
sub rsp, 1104
lea rbx, [rsp + 80]
mov edx, 1024
mov rdi, rbx
mov esi, 42
call qword ptr [rip + memset@GOTPCREL]
mov byte ptr [rsp + 15], 42
lea rax, [rsp + 15]
mov qword ptr [rsp + 16], rax
mov rax, qword ptr [rip + core::fmt::num::imp::<impl core::fmt::Display for u8>::fmt::ha81407c30cb780ca@GOTPCREL]
mov qword ptr [rsp + 24], rax
lea rax, [rip + .L__unnamed_1]
mov qword ptr [rsp + 32], rax
mov qword ptr [rsp + 40], 2
mov qword ptr [rsp + 64], 0
lea rax, [rsp + 16]
mov qword ptr [rsp + 48], rax
mov qword ptr [rsp + 56], 1
lea rdi, [rsp + 32]
call qword ptr [rip + std::io::stdio::_print::hd6837e34a66547dd@GOTPCREL]
mov rdi, rbx
call example::b::hea8802b300eb5620
add rsp, 1104
pop rbx
ret
example::b::hea8802b300eb5620:
sub rsp, 72
movzx eax, byte ptr [rdi + 123]
mov byte ptr [rsp + 7], al
lea rax, [rsp + 7]
mov qword ptr [rsp + 8], rax
mov rax, qword ptr [rip + core::fmt::num::imp::<impl core::fmt::Display for u8>::fmt::ha81407c30cb780ca@GOTPCREL]
mov qword ptr [rsp + 16], rax
lea rax, [rip + .L__unnamed_1]
mov qword ptr [rsp + 24], rax
mov qword ptr [rsp + 32], 2
mov qword ptr [rsp + 56], 0
lea rax, [rsp + 8]
mov qword ptr [rsp + 40], rax
mov qword ptr [rsp + 48], 1
lea rdi, [rsp + 24]
call qword ptr [rip + std::io::stdio::_print::hd6837e34a66547dd@GOTPCREL]
add rsp, 72
ret
You can see several parts in the assembly code:
call memset line), which just initializes 1024 bytes to 42.call std::io::stdio::_print).rdi (which is, by convention, used to pass the first argument of a function in the typical Linux amd64 ABI)fn b()fn b()'s bodyNote there is no copying involved here; the compiler is smart enough to see you don't need a copy of the value, so it just passes the pointer to the already existing array. However, keep in mind that certain operations, like printing out the memory address of the variable, or even passing it directly into println!() instead of making a local variable first may change this behavior.
As about the which part of the compiler should be responsible for that part of the question — you can see the optimization being done on the Rust MIR level (also can be seen in the Compiler Explorer), so it's being done by the Rust compiler, as opposed to LLVM:
Rust MIR:
fn main() -> () {
let mut _0: ();
let mut _1: [u8; 1024];
// ...
bb1: {
StorageDead(_4);
StorageDead(_6);
_9 = b(move _1) -> [return: bb2, unwind continue];
}
// ...
}
fn b(_1: [u8; 1024]) -> () {
debug arr_b => _1;
let mut _0: ();
let _2: u8;
let _3: ();
// ...
The _9 = b(move _1) -> [return: bb2, unwind continue]; part tells us that the compiler will just pass the pointer to the already existing array. The opposite would be b(copy _1), where we would have a memcpy. For comparison, the following code generates an array copy before the call to b(), because we're trying to see the memory addresses of the values:
pub fn main() {
let mut arr_a: [u8; 1024] = [42; 1024];
println!("{:p}", &arr_a);
b(arr_a);
}
#[inline(never)]
fn b(arr_b: [u8; 1024]) {
println!("{:p}", &arr_b);
}
Compiler Explorer
And indeed, we can see the following line in the Rust MIR code:
_9 = b(copy _1) -> [return: bb2, unwind continue];
However, keep in mind that similar optimizations might be done on the LLVM IR level as well.
I've answered a similar question here, and my answer there includes a few details not being spoken about in this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With