I came across an interesting case while playing with zero sized types (ZSTs). A reference to an empty array will mold to a reference with any lifetime:
fn mold_slice<'a, T>(_: &'a T) -> &'a [T] {
&[]
}
I thought about how that is possible, since basically the "value" here lives on the stack frame of the function, yet the signature promises to return a reference to a value with a longer lifetime ('a
contains the function call). I came to the conclusion that it is because the empty array []
is a ZST which basically only exists statically. The compiler can "fake" the value the reference refers to.
So I tried this:
fn mold_unit<'a, T>(_: &'a T) -> &'a () {
&()
}
and then the compiler complained:
error: borrowed value does not live long enough
--> <anon>:7:6
|
7 | &()
| ^^ temporary value created here
8 | }
| - temporary value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the block at 6:40...
--> <anon>:6:41
|
6 | fn mold_unit<'a, T>(_: &'a T) -> &'a () {
| ^
It doesn't work for the unit ()
type, and it also does not work for an empty struct:
struct Empty;
// fails to compile as well
fn mold_struct<'a, T>(_: &'a T) -> &'a Empty {
&Empty
}
Somehow, the unit type and the empty struct are treated differently from the empty array. Are there any additional differences between those values besides just being ZSTs? Do the differences (&[]
fitting any lifetime and &()
, &Empty
not) nothing to do with ZSTs at all?
Playground example
It's not that []
is zero-sized (though it is), it's that []
is a constant, compile-time literal. This means the compiler can store it in the executable, rather than having to allocate it dynamically on the heap or stack. This, in turn, means that pointers to it last as long as they want, because data in the executable isn't going anywhere.
Annoyingly, this doesn't extend to something like &[0]
, because Rust isn't quite smart enough to realise that [0]
is definitely constant. You can work around this by using something like:
fn mold_slice<'a, T>(_: &'a T) -> &'a [i32] {
const C: &'static [i32] = &[0];
C
}
This trick also works with anything you can put in a const
, like ()
or Empty
.
Realistically, however, it'd be simpler to just have functions like this return a &'static
borrow, since that can be coerced to any other lifetime automatically.
Edit: the previous version noted that &[]
is not zero sized, which was a little tangential.
Do the differences (
&[]
fitting any lifetime and&()
,&Empty
not) nothing to do with ZSTs at all?
I think this is exactly the case. The compiler probably just treats arrays differently and there is no deeper reasoning behind it.
The only difference that could play a role is that &[]
is a fat pointer, consisting of the data pointer and a length. This fat pointer itself expresses the fact that there is actually no data behind it (because length=0). &()
on the other hand is just a normal pointer. Here, only the type system expresses the fact that it's not pointing to anything real. But I'm just guessing here.
To clarify: a referencing fitting any lifetime means that the reference has the 'static
lifetime. So instead of introducing some lifetime 'a
, we can just return a static reference and will have the same effect (&[]
works, the others don't).
There is an RFC which specifies that references to constexpr rvalues will be stored in the static data section of the executable, instead of the stack. After this RFC has been implemented (tracking issue), all of your example will compile, as []
, ()
and Empty
are constexpr rvalues. References to it will always be 'static
. But the important part of the RFC is that it works for non-ZSTs, too: e.g. &27
has the type &'static i32
.
To have some fun, let's look at the generated assembly (I used the amazing Compiler Explorer)! First let's try the working version:
pub fn mold_slice() -> &'static [i32] {
&[]
}
Using the -O
flag (meaning: optimizations enabled; I checked the unoptimized version, too, and it doesn't have significant differences), this is compiled down to:
mold_slice:
push rbp
mov rbp, rsp
lea rax, [rip + ref.0]
xor edx, edx
pop rbp
ret
ref.0:
The fat pointer is returned in the rax
(data pointer) and rdx
(length) registers. As you can see, the length is set to 0 (xor edx, edx
) and the data pointer is set to this mysterious ref.0
. The ref.0
is not actually referencing anything at all. It's just an empty marker. This means we return just some pointer to the data section.
Now let's just tell the compiler to trust us on &()
in order to compile it:
pub fn possibly_broken() -> &'static () {
unsafe { std::mem::transmute(&()) }
}
Result:
possibly_broken:
push rbp
mov rbp, rsp
lea rax, [rip + ref.1]
pop rbp
ret
ref.1:
Wow, we pretty much see the same result! The pointer (returned via rax
) points somewhere to the data section. So it actually is a 'static
reference after code generation. Only the lifetime checker doesn't quite know that and still refuses to compile the code. Well... I guess this is nothing dramatic, especially since the RFC mentioned above will fix that in near future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With