Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it Ok to have a reference with incorrect (larger than correct) lifetime in scope?

Does having a reference &'a T immediately cause UB (undefined behavior) if 'a is larger than the referenced value? OR is it fine to have such a reference as long as it does not outlive the referenced value of type T?

As a comparison: mem::transmute::<u8, bool>(2) is immediate UB, even if you never access the returned value. The same is true if you have a reference with value 0, because references always have to be valid. Even if you never access them. On the other hand, having ptr::null() is not a problem until you try to dereference the null pointer.

Consider this code:

let x = '🦀';
let r_correct: &char = &x;

{
    let r_incorrect: &'static char = unsafe { mem::transmute(r_correct) };
    println!("{}", r_incorrect);
}

In this code, there are two references to x. Neither outlives x. But the type of r_incorrect is clearly a lie because x does not live forever.

Does this code exhibit well defined behavior? I see three options:

  • (a) This code exhibits undefined behavior.
  • (b) This code's behavior is well defined ("safe").
  • (c) Rust has not yet defined the rules about this part of the language.
like image 469
Lukas Kalbertodt Avatar asked Jul 20 '21 11:07

Lukas Kalbertodt


People also ask

Why are lifetimes needed in Rust?

Lifetimes are what the Rust compiler uses to keep track of how long references are valid for. Checking references is one of the borrow checker's main responsibilities. Lifetimes help the borrow checker ensure that you never have invalid references.

How do Rust lifetimes work?

A lifetime is a construct the compiler (or more specifically, its borrow checker) uses to ensure all borrows are valid. Specifically, a variable's lifetime begins when it is created and ends when it is destroyed. While lifetimes and scopes are often referred to together, they are not the same.

What is static lifetime Rust?

Static items have the static lifetime, which outlives all other lifetimes in a Rust program. Static items may be placed in read-only memory if the type is not interior mutable. Static items do not call drop at the end of the program.


3 Answers

No. Undefined Behaviour would only occur if you accessed r_incorrect after x has gone out of scope, which you are not doing here.

Lifetime annotiations in Rust are checked by the compiler to make sure you are not doing anything that would cause memory unsafety but—assuming the borrow checker is happy—they have no impact on the binary that is produced or on how long a variable actually lives.

In your example, you are claiming to the compiler that the lifetime of r_incorrect is much longer than it really is, but there is no problem because you only access it within its valid lifetime.

The danger with this is that future changes to the code could attempt to use r_incorrect beyond its true lifetime. The compiler cannot prevent that from happening because you have already insisted that it's okay.

like image 78
Peter Hall Avatar answered Oct 13 '22 01:10

Peter Hall


To the best of my knowledge, there is no official resource explicitly stating whether or not having and/or dereferencing a reference with larger than correct lifetime results in undefined behavior. However, there are multiple resources that talk about undefined behavior in Rust and dereferencing references with unbounded lifetimes that give hints about the definedness of doing this.

Things causing Undefined Behavior according to Rustonomicon

Quote from the Rustonomicon, chapter "What Unsafe Can Do" (bold highlighting and [italic text in brackets] in this and in all following quotes is by me):

Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core language cares about is preventing the following things:

  • Dereferencing (using the * operator on) dangling or unaligned pointers (see below)
  • Breaking the pointer aliasing rules
  • Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI.
  • Causing a data race
  • Executing code compiled with target features that the current thread of execution does not support
  • Producing invalid values (either alone or as a field of a compound type such as enum/struct/array/tuple):
    • [lots of subitems that are irrelevant to reference lifetimes]

"Producing" a value happens any time a value is assigned, passed to a function/primitive operation or returned from a function/primitive operation.

A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same allocation (so in particular they all have to be part of some allocation). The span of bytes it points to is determined by the pointer value and the size of the pointee type. As a consequence, if the span is empty, "dangling" is the same as "non-null". Note that slices and strings point to their entire range, so it's important that the length metadata is never too large (in particular, allocations and therefore slices and strings cannot be bigger than isize::MAX bytes). If for some reason this is too cumbersome, consider using raw pointers.

That's it. That's all the causes of Undefined Behavior baked into Rust. Of course, unsafe functions and traits are free to declare arbitrary other constraints that a program must maintain to avoid Undefined Behavior.

Only the first two bullet points are related to pointers and/or references.

  • The first point is about dereferencing dangling and unaligned pointers.

    • Unaligned pointers: Transmuting the lifetime of a reference cannot change its alignedness, so this is not a problem here.
    • Dangling pointers: During the correct lifetime, a reference cannot be dangling according to the definition above. Therefore, during the correct lifetime, the transmuted reference is also not dangling and can be dereferenced without causing undefined behavior.
  • The second point is about the pointer aliasing rules of Rust. Quote from the Rustonomicon, chapter "References" (which is also linked in the quote above when talking about pointer aliasing rules):

    There are two kinds of reference:

    • Shared reference: &
    • Mutable reference: &mut

    Which obey the following rules:

    • A reference cannot outlive its referent
    • A mutable reference cannot be aliased

    That's it. That's the whole model references follow.

    Of course, we should probably define what aliased means.

    error[E0425]: cannot find value `aliased` in this scope
     --> <rust.rs>:2:20
      |
    2 |     println!("{}", aliased);
      |                    ^^^^^^^ not found in this scope
    
    error: aborting due to previous error
    

    Unfortunately, Rust hasn't actually defined its aliasing model. 🙀

    While we wait for the Rust devs to specify the semantics of their language, let's use the next section to discuss what aliasing is in general, and why it matters.

    • That first point does indeed not sound that good for our case – "a reference cannot outlive its referent". However, quote from the Rustonomicon, chapter "Lifetimes", section "The area covered by a lifetime":

      The lifetime (sometimes called a borrow) is alive from the place it is created to its last use. The borrowed thing needs to outlive only borrows that are alive.

      So, as we are not using the reference after the end of the correct lifetime, it is not alive anymore. Therefore, the referent also does not need to be alive after the end of the correct lifetime.

    • The second point is only about mutable references – in your example, a shared reference is used. And either way, as long as the original value is not used while the transmuted reference is alive, there is no pointer aliasing going on by any definition (though, as the Rustonomicon says, there is no aliasing model defined for Rust, so language lawyering about this is hard...)

Conclusion – Things causing Undefined Behavior according to Rustonomicon

The enumeration of things triggering undefined behavior in the Rustonomicon does not contain having and dereferencing a reference with a larger than correct lifetime as long as this reference is not accessed after the end of the correct lifetime.

Things causing Undefined Behavior according to the Rust Reference

The Rustonomicon is not the only documentation talking about undefined behavior. Quote from the Rust Reference, chapter "Behavior considered undefined":

Rust code is incorrect if it exhibits any of the behaviors in the following list. This includes code within unsafe blocks and unsafe functions. unsafe only means that avoiding undefined behavior is on the programmer; it does not change anything about the fact that Rust programs must never cause undefined behavior.

It is the programmer's responsibility when writing unsafe code to ensure that any safe code interacting with the unsafe code cannot trigger these behaviors. unsafe code that satisfies this property for any safe client is called sound; if unsafe code can be misused by safe code to exhibit undefined behavior, it is unsound.


⚠️ Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is undefined behavior. Please read the Rustonomicon before writing unsafe code.


  • Data races.
  • Evaluating a dereference expression (*expr) on a raw pointer that is dangling or unaligned, even in place expression context (e.g. addr_of!(&*expr)).
  • Breaking the pointer aliasing rules. &mut T and &T follow LLVM’s scoped noalias model, except if the &T contains an UnsafeCell<U>.
  • Mutating immutable data. All data inside a const item is immutable. Moreover, all data reached through a shared reference or data owned by an immutable binding is immutable, unless that data is contained within an UnsafeCell<U>.
  • Invoking undefined behavior via compiler intrinsics.
  • Executing code compiled with platform features that the current platform does not support (see target_feature).
  • Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI.
  • Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
    • [lots of subitems that are irrelevant to reference lifetimes]

Note: Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values. In other words, the only cases in which reading uninitialized memory is permitted are inside unions and in "padding" (the gaps between the fields/elements of a type).


Note: Undefined behavior affects the entire program. For example, calling a function in C that exhibits undefined behavior of C means your entire program contains undefined behaviour that can also affect the Rust code. And vice versa, undefined behavior in Rust can cause adverse affects on code executed by any FFI calls to other languages.

Dangling pointers

A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same allocation (so in particular they all have to be part of some allocation). The span of bytes it points to is determined by the pointer value and the size of the pointee type (using size_of_val). As a consequence, if the span is empty, "dangling" is the same as "non-null". Note that slices and strings point to their entire range, so it is important that the length metadata is never too large. In particular, allocations and therefore slices and strings cannot be bigger than isize::MAX bytes.

This list, including the two points in bold, is more or less equivalent to the list form the Rustonomicon (though a bit less strict, as the first bold bullet point only forbids dereferencing dangling raw pointers, not dereferencing dangling references – I guess this is an oversight). There are a few interesting link to the LLVM documentation, but in the end, the result is the same: Having a reference with a larger than correct lifetime does, according to this list, not result in undefined behavior as long as the reference is not dereferenced after the end of the correct lifetime. However, there is an additional note here:

⚠️ Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is undefined behavior. Please read the [Rustonomicon] before writing unsafe code.

Conclusion – Things causing Undefined Behavior according to the Rust Reference

The Rust Reference does not contain an exhaustive enumeration of all things that can trigger undefined behavior. While the Rust Reference does not explicitely say that references with larger than correct lifetime do trigger undefined behavior, it also does not explicitely say that they don't.

Rustonomicon on unbounded lifetimes

Quote from the Rustonomicon, chapter "Unbounded Lifetimes":

Unsafe code can often end up producing references or lifetimes out of thin air. Such lifetimes come into the world as unbounded. The most common source of this is dereferencing a raw pointer, which produces a reference with an unbounded lifetime. Such a lifetime becomes as big as context demands. This is in fact more powerful than simply becoming 'static, because for instance &'static &'a T will fail to typecheck, but the unbound lifetime will perfectly mold into &'a &'a T as needed. However for most intents and purposes, such an unbounded lifetime can be regarded as 'static.

Almost no reference is 'static, so this is probably wrong. transmute and transmute_copy are the two other primary offenders. One should endeavor to bound an unbounded lifetime as quickly as possible, especially across function boundaries.

The Rustonomicon says that one should "endeavor to bound an unbounded lifetime as quickly as possible, especially across function boundaries". It gives no indication that dereferencing a unbounded reference – assuming that the referent is still alive – would result in undefined behavior. As dereferencing a reference is a common operation, I can't imagine the Rustonomicon not pointing out such an obvious gotcha. I thereby conclude that dereferencing a unbounded reference does not result in undefined behavior as long as the referent is still alive.

However, the question is not about references with unbounded lifetimes, but about references with larger than correct lifetimes, for example a &'static T. The Rustonomicon points out that "for most intents and purposes, [...] an unbounded lifetime can be regarded as 'static". This does not definitely imply that dereferencing a reference with larger than correct lifetime is as much defined behavior as dereferencing a reference with an unbounded lifetime. However, I don't see why rustc should handle unbounded lifetimes differently in this regard. If it did, I would expect the Rustonomicon to contain a note that it does and that a unbounded lifetime is still safer than a wrongly bound lifetime.

Conclusion – Rustonomicon on unbounded lifetimes

Dereferencing a unbounded lifetime is probably not undefined behavior according to the Rustonomicon. This may or may not extend to references with bound, but larger than correct lifetimes – in my opinion, it does extend.

Example on transmute() documentation

Quote from the standard library documentation on std::mem::transmute(), second example:

Extending a lifetime [...]. This is advanced, very unsafe Rust!

struct R<'a>(&'a i32);
unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> {
    std::mem::transmute::<R<'b>, R<'static>>(r)
}

*[...]*

This is as definitive evidence as you will get that at least having a reference with larger than correct lifetime does not result in undefined behavior – otherwise, anyone calling this function with any non-'static reference would instantly invoke undefined behavior, and this function would be very, very useless. Additionally, to me this implies that you are also allowed to dereference the references returned by extend_lifetime – what would otherwise be the benefit of that function?

Conclusion – Example on transmute() documentation

The example on the transmute() documentation seems to imply that dereferencing a reference with a larger than correct lifetime is well defined.

Final conclusion

Sadly, the documentation on the details of Unsafe Rust is still incomplete, and there are many questions about edge cases like this one that the documentation simply can't give an definitive answer to yet. However, all documentation that is remotely pertinent to the problem seems to imply that dereferencing the reference in question in fact is well-defined behavior. Whether or not this is enough for you to do something like this is up to you – it probably would be for me.

However, you really shouldn't do this

To clarify: While this code may be well defined, it is definitely still a foot gun. Passing such a reference across function borders is a bad idea, especially if the function is pub and thereby callable by other modules/crates. Even without passing it across function borders, it is still easy to mis-use this reference, resulting in your code causing undefined behavior. If you think you may need to do this, I urge you to rethink whether you can refactor your code to avoid transmuting the lifetimes. For example, it might be safer (or at least more clear that this is totally and utterly unsafe) to directly use raw pointers instead of references with incorrect lifetimes.

like image 1
Elias Holzmann Avatar answered Oct 12 '22 23:10

Elias Holzmann


It is sound for a reference to exist with the wrong lifetime as long as it's not dangling (as long as the pointed-to value is not deallocated).

You must ensure no references exist to the value before deallocating it. It is UB for a reference to exist to a deallocated value, even if you never read/write to it. Just it existing is instant UB.

From the reference's behavior considered undefined:

Producing an invalid value, even in private fields and locals [..]:

  • A reference or Box that is dangling, unaligned, or points to an invalid value.

It is a common myth that Rust references are "just lifetime-checked pointers". The reality is they're far more strict: they must be valid (non-dangling, aligned, pointing to a valid value) for the entire time they exist, even if you don't read/write to them. Compare with raw pointers, which only need to be valid when you read/write to them.

It is not UB per se to have a reference with the "wrong" lifetime as long as you ensure they're not dangling. Nothing in the behavior considered undefined says so. Lifetimes are just a tool for the borrow checker to enforce references are valid, bypassing them with transmute is not immediately UB, it just means it's now up to you to ensure all references are valid.

like image 1
Dirbaio Avatar answered Oct 13 '22 01:10

Dirbaio