Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lifetime differences between references to zero sized types

Tags:

rust

I came across an interesting case while playing with zero sized types (ZSTs). A reference to an empty array will mold to a reference with any lifetime:

fn mold_slice<'a, T>(_: &'a T) -> &'a [T] {
    &[]
}

I thought about how that is possible, since basically the "value" here lives on the stack frame of the function, yet the signature promises to return a reference to a value with a longer lifetime ('a contains the function call). I came to the conclusion that it is because the empty array [] is a ZST which basically only exists statically. The compiler can "fake" the value the reference refers to.

So I tried this:

fn mold_unit<'a, T>(_: &'a T) -> &'a () {
    &()
}

and then the compiler complained:

error: borrowed value does not live long enough
 --> <anon>:7:6
  |
7 |     &()
  |      ^^ temporary value created here
8 | }
  | - temporary value only lives until here
  |
note: borrowed value must be valid for the lifetime 'a as defined on the block at 6:40...
 --> <anon>:6:41
  |
6 | fn mold_unit<'a, T>(_: &'a T) -> &'a () {
  |                                         ^

It doesn't work for the unit () type, and it also does not work for an empty struct:

struct Empty;

// fails to compile as well
fn mold_struct<'a, T>(_: &'a T) -> &'a Empty {
    &Empty
}

Somehow, the unit type and the empty struct are treated differently from the empty array. Are there any additional differences between those values besides just being ZSTs? Do the differences (&[] fitting any lifetime and &(), &Empty not) nothing to do with ZSTs at all?

Playground example

like image 383
jtepe Avatar asked Jan 27 '17 16:01

jtepe


2 Answers

It's not that [] is zero-sized (though it is), it's that [] is a constant, compile-time literal. This means the compiler can store it in the executable, rather than having to allocate it dynamically on the heap or stack. This, in turn, means that pointers to it last as long as they want, because data in the executable isn't going anywhere.

Annoyingly, this doesn't extend to something like &[0], because Rust isn't quite smart enough to realise that [0] is definitely constant. You can work around this by using something like:

fn mold_slice<'a, T>(_: &'a T) -> &'a [i32] {
    const C: &'static [i32] = &[0];
    C
}

This trick also works with anything you can put in a const, like () or Empty.

Realistically, however, it'd be simpler to just have functions like this return a &'static borrow, since that can be coerced to any other lifetime automatically.

Edit: the previous version noted that &[] is not zero sized, which was a little tangential.

like image 98
DK. Avatar answered Oct 19 '22 06:10

DK.


Do the differences (&[] fitting any lifetime and &(), &Empty not) nothing to do with ZSTs at all?

I think this is exactly the case. The compiler probably just treats arrays differently and there is no deeper reasoning behind it.

The only difference that could play a role is that &[] is a fat pointer, consisting of the data pointer and a length. This fat pointer itself expresses the fact that there is actually no data behind it (because length=0). &() on the other hand is just a normal pointer. Here, only the type system expresses the fact that it's not pointing to anything real. But I'm just guessing here.


To clarify: a referencing fitting any lifetime means that the reference has the 'static lifetime. So instead of introducing some lifetime 'a, we can just return a static reference and will have the same effect (&[] works, the others don't).

There is an RFC which specifies that references to constexpr rvalues will be stored in the static data section of the executable, instead of the stack. After this RFC has been implemented (tracking issue), all of your example will compile, as [], () and Empty are constexpr rvalues. References to it will always be 'static. But the important part of the RFC is that it works for non-ZSTs, too: e.g. &27 has the type &'static i32.


To have some fun, let's look at the generated assembly (I used the amazing Compiler Explorer)! First let's try the working version:

pub fn mold_slice() -> &'static [i32] {
    &[]
}

Using the -O flag (meaning: optimizations enabled; I checked the unoptimized version, too, and it doesn't have significant differences), this is compiled down to:

mold_slice:
        push    rbp
        mov     rbp, rsp
        lea     rax, [rip + ref.0]
        xor     edx, edx
        pop     rbp
        ret

ref.0:

The fat pointer is returned in the rax (data pointer) and rdx (length) registers. As you can see, the length is set to 0 (xor edx, edx) and the data pointer is set to this mysterious ref.0. The ref.0 is not actually referencing anything at all. It's just an empty marker. This means we return just some pointer to the data section.

Now let's just tell the compiler to trust us on &() in order to compile it:

pub fn possibly_broken() -> &'static () {
    unsafe { std::mem::transmute(&()) } 
}

Result:

possibly_broken:
        push    rbp
        mov     rbp, rsp
        lea     rax, [rip + ref.1]
        pop     rbp
        ret

ref.1:

Wow, we pretty much see the same result! The pointer (returned via rax) points somewhere to the data section. So it actually is a 'static reference after code generation. Only the lifetime checker doesn't quite know that and still refuses to compile the code. Well... I guess this is nothing dramatic, especially since the RFC mentioned above will fix that in near future.

like image 23
Lukas Kalbertodt Avatar answered Oct 19 '22 07:10

Lukas Kalbertodt