Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does moving data to Rc/Arc always copy it from the stack to the heap?

Tags:

rust

Take a look at the following simple example:

use std::rc::Rc;

struct MyStruct {
    a: i8,
}

fn main() {
    let mut my_struct = MyStruct { a: 0 };
    my_struct.a = 5;
    let my_struct_rc = Rc::new(my_struct);

    println!("my_struct_rc.a = {}", my_struct_rc.a);
}

The official documentation of Rc says:

The type Rc<T> provides shared ownership of a value of type T, allocated in the heap.

Theoretically it is clear. But, firstly my_struct is not immediately wrapped into Rc, and secondly MyStruct is a very simple type. I can see 2 scenarios here.

  1. When my_struct is moved into the Rc the memory content is literally copied from the stack to the heap.
  2. The compiler is able to resolve that my_struct will be moved into the Rc, so it puts it on the heap from the beginning.

If number 1 is true, then there might be a hidden performance bottleneck as when reading the code one does not explicitly see memory being copied (I am assuming MyStruct being much more complex).

If number 2 is true, I wonder whether the compiler is always able to resolve such things. The provided example is very simple, but I can imagine that my_struct is much more complex and is mutated several times by different functions before being moved to the Rc.

like image 622
Al Bundy Avatar asked Oct 14 '22 20:10

Al Bundy


1 Answers

Tl;dr It could be either scenario, but for the most part, you should just write code in the most obvious way and let the compiler worry about it.

According to the semantics of the abstract machine, that is, the theoretical model of computation that defines Rust's behavior, there is always a copy. In fact, there are at least two: my_struct is first created in the stack frame of main, but then has to be moved into the stack frame of Rc::new. Then Rc::new has to create an allocation and move my_struct a second time, from its own stack frame into the newly allocated memory*. Each of these moves is conceptually a copy.

However, this analysis isn't particularly useful for predicting the performance of code in practice, for three reasons:

  1. Copies are actually pretty darn cheap. Moving my_struct from one place to another may actually be much cheaper, in the long run, than referencing it with a pointer. Copying a chunk of bytes is easy to optimize on modern processors; following a pointer to some arbitrary location is not. (Bear in mind also that the complexity of the structure is irrelevant because all moves are bytewise copies; for instance, moving any Vec is just copying three usizes regardless of the contents.)

    If you haven't measured the performance and shown that excessive copying is a problem, you must not assume that it is without evidence: you may accidentally pessimize instead of optimizing your code. Measure first.

  2. The semantics of the abstract machine is not the semantics of your real machine. The whole point of an optimizing compiler is to figure out the best way to transform one to the other. Under reasonable assumptions, it's very unlikely that the code here would result in 2 copies with optimizations turned on. But how the compiler eliminates one or both copies may be dependent on the rest of the code: not just on the snippet that contains them but on how the data is initialized and so forth. Real machine performance is complicated and generally requires analysis of more than just a few lines at a time. Again, this is the whole point of an optimizing compiler: it can do a much more comprehensive analysis, much faster than you or I can.

    Even if the compiler leaves a copy "on the table", you shouldn't assume without evidence that removing the copy would make things better simply because it is a copy. Measure first.

  3. It probably doesn't matter anyway, in this case. Requesting a new allocation from the heap is likely† more expensive than copying a bunch of bytes from one place to another, so fiddling around with 1 fast copy vs. no copies while ignoring a (plausible) big bottleneck is probably a waste of time. Don't try to optimize things before you've profiled your application or library to see where the most performance is being lost. Measure first.

See also

Questions about overflowing the stack by accidentally putting large data on it (to which the solution is usually to use Vec instead of an array):

  • How to allocate arrays on the heap in Rust 1.0?
  • Thread '<main>' has overflowed its stack when allocating a large array using Box

* Rc, although part of the standard library, is written in plain Rust code, which is how I analyze it here. Rc could theoretically be subject to guaranteed optimizations that aren't available to ordinary code, but that doesn't happen to be relevant to this case.

† Depending at least on the allocator and on whether new memory must be acquired from the OS or if a recently freed allocation can be re-used.

like image 109
trent Avatar answered Oct 22 '22 04:10

trent