Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does std::thread::Scope::spawn need the bound `T: 'scope`?

Tags:

rust

lifetime

The signature of std::thread::Scope::spawn looks like this:

pub fn spawn<F, T>(&'scope self, f: F) -> ScopedJoinHandle<'scope, T>
where
    F: FnOnce() -> T + Send + 'scope,
    T: Send + 'scope,

I understand why the requirement F: 'scope is there: the thread is only guaranteed to be joined by the time the call to scope returns, so it must not capture references with lifetimes smaller than the call to scope as it may otherwise use them from the child thread after the lifetime is exited on the parent thread.

But I'm wondering about the requirement T: 'scope. I understand this to mean that the return type of F must not include references with a lifetime smaller than the call to scope. I suppose this makes sense for a similar reason, but I don't see how this could possibly go wrong given the requirement F: 'scope. How could a function with access only to lifetimes that outlive 'scope return a reference with lifetime shorter than 'scope without the borrow checker complaining about that function in isolation, before it's even passed to spawn?

Is the T: 'scope requirement redundant?


I'm looking for an answer that provides one of the following:

  • An example of an unsound caller that is accepted without the bound.
  • An example of a sound caller that stops being accepted if the bound is removed.
  • Some citation that proves the bound is redundant, i.e. there cannot be any example of either type above.
like image 530
jacobsa Avatar asked Mar 28 '26 01:03

jacobsa


1 Answers

This is a surprisingly complex situation, because Rust's type system contains soundness bugs and one of them is relevant in this case. A good summary is "there's a soundness bug related to functions that outlive their return values – it is exploitable without using spawn, but the bounds on spawn ensure that there isn't an alternative exploit that uses spawn instead". Depending on how the bug ends up being fixed, the bound might be either necessary or unnecessary in the fixed version of the type system.


Rust's type system assumes that you can't create a value with a lifetime parameter 'a outside the lifetime 'a itself. If you didn't have the T: 'scope requirement, then, if you found a function F: 'scope that returned a T with a lifetime shorter than 'scope, you would be able to pass F to spawn and eventually get a T outside its lifetime from the ScopedJoinHandle.

It turns out that it in fact possible to create such a function. All you have to do is get a closure that captures a static reference, and coerces it to a reference with a shorter lifetime in order to return it. For example:

trait StaticFn<T>: 'static + Fn() -> T {}
impl<T, F: 'static + Fn() -> T> StaticFn<T> for F {}

struct Lifetimed<'a>(&'a ());
static LIFETIMED: Lifetimed<'static> = Lifetimed(&());

struct S<'a>(&'a str);
impl<'a> S<'a> {
    fn maker(&self) -> impl StaticFn<&'a Lifetimed<'a>> {
        || &LIFETIMED
    }
}

pub fn main() {
    // lifetime of the string in scope is 'a
    let scope = String::from("Hello, world!");
    let string_in_scope = &*scope;
    let t = S(string_in_scope);
    let lifetime_maker = t.maker();
    drop(scope); // 'a ends here
    
    // Now create a Lifetimed<'a> even though 'a has ended
    let lifetimed = (lifetime_maker)();
}

This works because the closure is capturing only 'static values (making it 'static), and it's possible to coerce &'static T into &'a T for any lifetime 'a because you can always shorten the lifetime of a shared reference.

It's then possible to do various impossible things with the resulting value-that-exists-outside-its-lifetime. Here's a simple demonstration of how this sort of value could be used to break the type system:

pub fn main() {
    /* ... previous body of main goes here ... */
    lifetimed.forward(outside_lifetime_is_longer);
}

impl<'a> Lifetimed<'a> {
    fn forward<T>(&self, f: impl Fn(&'a ()) -> T) -> T {
        (f)(self.0)
    }
}
struct AIsLonger<'b, 'a: 'b>(&'b &'a ());
fn outside_lifetime_is_longer<'a>(outside: &'a ()) {
    let ail = AIsLonger::<'_, 'a>(&outside);
}

The type of AIsLonger requires that 'a outlives 'b, i.e. whenever 'b is live, 'a is also live. outside_lifetime_is_longer uses the assumption that its lifetime parameter 'a is alive at the time to create a value ail of type AIsLonger that proves that 'a (the lifetime of string_in_scope) outlives 'b (the lifetime of the outside parameter). So we've created code that proves a relationship between lifetimes that isn't actually true, which is unsound (and can, with much more complicated code than this, eventually be used to cause a use-after-free).

You might notice that I didn't actually use Scope::spawn at all in the example above. The thing is, it would (without the bound you asked about in the question) be possible to use Scope::spawn on a 'static function that returns a lifetimed value in order to produce a value outside its own lifetime (spawn the thread while inside the lifetime, using the 'static function as the function the thread runs, and then join the thread afterwards – because the function is 'static the bound on the return type is the only thing that prevents you doing that). But there's a much simpler way to do the same exploit: just wait for the value's lifetime to end, then call the function. This is a soundness bug in Rust's type system that's tracked in Rust issue 84366, which also contains examples of how to produce an actual use-after-free using this sort of code.

So, is the bound necessary? The answer is both yes and no! It's clearly necessary because being able to create a value of a type outside the lifetime of that type is unsound and can cause use-after-free in practice (and based on the documentation for thread::spawn, this was the original motivation for adding the bound). It's also clearly unnecessary because it doesn't let you do anything that you couldn't do by just calling the function directly, without using a thread. Because the assumptions that the type system makes are self-contradictory, you can end up with a contradiction like this if you are trying to reason using the rules of the type system.

I am glad that the bound is there, though. Rust is going to have to make some sort of breaking change in order to fix the soundness bug in the type system, and the bound means that this particular type system exploit can't be done via spawn – it has to be done by a direct call to the function instead. So that reduces the number of possible code patterns that might need changes made to them as a consequence of the change that fixes the bug.

like image 140
ais523 Avatar answered Mar 31 '26 06:03

ais523