Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between how references and Box<T> are represented in memory?

Tags:

rust

I am trying to understand how references and Box<T> work. Let's consider a code example:

fn main() {
    let x = 5;
    let y = &x;

    assert_eq!(5, x);
    assert_eq!(5, *y);
}

In my imagination, Rust saves the value in memory as:

enter image description here

Consider this second code snippet with Box<T>:

fn main() {
    let x = 5;
    let y = Box::new(x);

    assert_eq!(5, x);
    assert_eq!(5, *y);
}

How is x going to be stored in Box? What does the memory look like?

The examples above are from Treating Smart Pointers Like Regular References with the Deref Trait. For the second example, the book explains it as:

The only difference between Listing 15-7 and Listing 15-6 is that here we set y to be an instance of a box pointing to the value in x rather than a reference pointing to the value of x.

Does it mean that y in the box points directly to value 5?

like image 277
softshipper Avatar asked Jan 24 '20 13:01

softshipper


People also ask

Which memory is used to store references?

An object reference variable must then hold a reference to those values. This reference represents the location where the object and its metadata are stored. There are two kinds of memory used in Java. These are called stack memory and heap memory.

What does box do in Rust?

All values in Rust are stack allocated by default. Values can be boxed (allocated on the heap) by creating a Box<T> . A box is a smart pointer to a heap allocated value of type T . When a box goes out of scope, its destructor is called, the inner object is destroyed, and the memory on the heap is freed.

How does Rust allocate memory?

Rust doesn't have a defined memory model in the language specifications as of now and the memory structure is quite straightforward. Each Rust program process is allocated some virtual memory by the Operating System(OS), this is the total memory that the process has access to.


2 Answers

Your diagram for the simple case is fine, although it may be unclear as you use 5 for both the value and the address. I've moved y in my diagram to prevent any confusion.

What does memory look like for a Box<T>?

The equivalent diagram for Box would look similar, but with the addition of the heap:

    Stack

     ADDR                    VALUE
    +------------------------------+
x = |0x0001|                     5 |
y = |0x0002|                0xFF01 |
    |0x0003|                       |
    |0x0004|                       |
    |0x0005|                       |
    +------------------------------+

    Heap

     ADDR                    VALUE
    +------------------------------+
    |0xFF01|                     5 |
    |0xFF02|                       |
    |0xFF03|                       |
    |0xFF04|                       |
    |0xFF05|                       |
    +------------------------------+

(See the pedantic notes below about this diagram)

Box has allocated enough space in the heap for us, here at address 0xFF01. The value is then moved from the stack onto the heap.

Does it mean that y in the box points directly

It does not. y holds the pointer to the data allocated by the Box. It must do this in order to be able to free the allocated memory when the Box goes out of scope.

The point of the chapter you are reading is that Rust will transparently dereference the Box for you, so you don't usually need to concern yourself with this fact.

See also:

  • Do I need to Box child structs of a Boxed struct to get everything on the heap?
  • What is the difference between Vec<i32> and Vec<Box<i32>>?
  • Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?
  • What are Rust's exact auto-dereferencing rules?
  • How do I get an owned value out of a `Box`?

What's the difference in memory?

This might bend your brain a little bit!

Looking at the stack for both examples, there isn't really a difference between the two cases — both the reference and the Box are stored on the stack as a pointer. The only difference is in the code, where it knows to treat the value on the stack differently depending on if it's a reference or Box.

In fact, this is true for everything in Rust! To the computer, it's all just bits, and the structure encoded in the program binary is the only thing that distinguishes one blob of bytes from another.

Why is x still on the stack after being moved to the Box?

Observant readers will note that I left the value 5 for x on the stack. There are two relevant reasons why:

  1. That's actually what happens in memory. Programs don't usually "reset" values they are done with as it would be unneeded overhead. Rust avoids problems by marking the variable as moved and disallowing access to the moved-from variable.

  2. In this case, i32 implements Copy, which means that it's OK to access the value after it's been moved. The compiler will actually allow us to continue accessing x. This wouldn't be true if x were a type that didn't implement Copy, such as a String or a Box.

See also:

  • Why does "move" in Rust not actually move?
  • How does Rust move stack variables that are not Copyable?
  • How does Rust provide move semantics?
  • What are move semantics in Rust?

Pedantic diagram notes

  • This diagram is not to scale. An i32 takes 4 bytes and a pointer / reference take a platform-dependent number of bytes, but it's simpler to assume everything is the same size.

  • The stack typically starts at a high address and grows downward, while the heap starts at a low address and grows upward.

like image 63
Shepmaster Avatar answered Oct 17 '22 11:10

Shepmaster


While the general rule is exactly the same as in that answer What are the differences between Rust's `String` and `str`?, I'm answering here as well.

A Rust reference is (almost) exactly what you have described: a pointer to the value somewhere in the memory. (It's not always. For example, slices also contain a length and pointers to traits also contain a v-table. These are called fat pointers). At the start, the Box<T> is a value, like any other value in Rust, so the difference is obvious - one is a reference to a place in memory and the second is a value somewhere in memory. The confusion is that Box<T> internally contains a reference to memory, but that reference is allocated on the heap instead of stack. The difference between these two is that the stack is function local and is quite small (on my macOS it is max 8192 KiB).

For example, you cannot do something like this for a few reasons:

fn foo() -> &u32 {
    let a = 5;

    &a
}

The most important reason is that a will not be there after foo() returns. That memory will be wiped out (not always though) and it is possible that it will be changed to another value soon. This is undefined behavior in C and C++ and an error in Rust which does not allow for any undefined behavior (in code that does not use unsafe).

On the other hand, if you do:

fn foo() -> Box<u32> {
    let a = Box::new(5);

    a
}

A few things relevant to us will happen:

  • memory will be allocated on the stack. This memory is totally independent from the current function scope, which means that it need to be freed when it will not be needed
  • we will move the value, so there are no lifetimes involved
  • ownership of a will be moved to the caller

For convenience, Box<T> will behave like a reference in many cases, as these two can be often used interchangeably. For example, see this C program where we provide similar functionality to the second example:

int* foo(void) {
  int* a = malloc(sizeof(int));
  *a = 5;

  return a;
}

As you can see, the pointer is used to store the address of the memory and this is passed further.

like image 43
Hauleth Avatar answered Oct 17 '22 11:10

Hauleth