Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lifetime constraints to model scoped garbage collection

Tags:

I'm working with a friend to define a safe public API for lifetimes of a "scoped" garbage collector. The lifetimes are either overly constrained and correct code does not compile or the lifetimes are too loose and they may allow invalid behavior. After trying multiple approaches, we are still stuck getting a correct API. This is especially frustrating because Rust's lifetimes can help avoid bugs in this situation but right now it just looks stubborn.

Scoped garbage collection

I am implementing an ActionScript interpreter and need a garbage collector. I studied rust-gc but it did not suit my needs. The main reason is that it requires the garbage collected values to have a static lifetime because the GC state is a thread-local static variable. I need to get garbage-collected bindings to a dynamically created host object. The other reason to avoid globals is that it is easier for me to handle multiple independent garbage-collected scopes, control their memory limits or serialize them.

A scoped garbage collector is similar to a typed-arena. You can use it to allocate values and they are all freed once the garbage collector is dropped. The difference is that you can also trigger garbage collection during its lifetime and it will clean-up the unreachable data (and is not limited to a single type).

I have a working implementation implemented (mark & sweep GC with scopes), but the interface is not yet safe to use.

Here is a usage example of what I want:

pub struct RefNamedObject<'a> {
    pub name: &'a str,
    pub other: Option<Gc<'a, GcRefCell<NamedObject<'a>>>>,
}

fn main() {
    // Initialize host settings: in our case the host object will be replaced by a string
    // In this case it lives for the duration of `main`
    let host = String::from("HostConfig");

    {
        // Create the garbage-collected scope (similar usage to `TypedArena`)
        let gc_scope = GcScope::new();

        // Allocate a garbage-collected string: returns a smart pointer `Gc` for this data
        let a: Gc<String> = gc_scope.alloc(String::from("a")).unwrap();

        {
            let b = gc_scope.alloc(String::from("b")).unwrap();
        }

        // Manually trigger garbage collection: will free b's memory
        gc_scope.collect_garbage();

        // Allocate data and get a Gc pointer, data references `host`
        let host_binding: Gc<RefNamed> = gc_scope
            .alloc(RefNamedObject {
                name: &host,
                other: None,
            })
            .unwrap();

        // At the end of this block, gc_scope is dropped with all its
        // remaining values (`a` and `host_bindings`)
    }
}

Lifetime properties

The basic intuition is that Gc can only contain data that lives as long (or longer) than the corresponding GcScope. Gc is similar to Rc but supports cycles. You need to use Gc<GcRefCell<T>> to mutate values (similar to Rc<RefCell<T>>).

Here are the properties that must be satisfied by the lifetimes of my API:

Gc cannot live longer than its GcScope

The following code must fail because a outlives gc_scope:

let a: Gc<String>;
{
    let gc_scope = GcScope::new();
    a = gc_scope.alloc(String::from("a")).unwrap();
}
// This must fail: the gc_scope was dropped with all its values
println("{}", *a); // Invalid

Gc cannot contain data that lives shorter than its GcScope

The following code must fail because msg does not live as long (or longer) as gc_scope

let gc_scope = GcScope::new();
let a: Gc<&string>;
{
    let msg = String::from("msg");
    a = gc.alloc(&msg).unwrap();
}

It must be possible to allocate multiple Gc (no exclusion on gc_scope)

The following code must compile

let gc_scope = GcScope::new();

let a = gc_scope.alloc(String::from("a"));
let b = gc_scope.alloc(String::from("b"));

It must be possible to allocate values containing references with lifetimes longer than gc_scope

The following code must compile

let msg = String::from("msg");
let gc_scope = GcScope::new();
let a: Gc<&str> = gc_scope.alloc(&msg).unwrap();

It must be possible to create cycles of Gc pointers (that's the whole point)

Similarly to the Rc<Refcell<T>> pattern, you can use Gc<GcRefCell<T>> to mutate values and create cycles:

// The lifetimes correspond to my best solution so far, they can change
struct CircularObj<'a> {
    pub other: Option<Gc<'a, GcRefCell<CircularObj<'a>>>>,
}

let gc_scope = GcScope::new();

let n1 = gc_scope.alloc(GcRefCell::new(CircularObj { other: None }));
let n2 = gc_scope.alloc(GcRefCell::new(CircularObj {
    other: Some(Gc::clone(&n1)),
}));
n1.borrow_mut().other = Some(Gc::clone(&n2));

Solutions so far

Automatic lifetime / lifetime tag

Implemented on the auto-lifetime branch

This solution is inspired by neon's handles. This lets any valid code compile (and allowed me to test my implementation) but is too loose and allows invalid code. It allows Gc to outlive the gc_scope that created it. (Violates the first property)

The idea here is that I add a single lifetime 'gc to all my structs. The idea is that this lifetime represents "how long gc_scope lives".

// A smart pointer for `T` valid during `'gc`
pub struct Gc<'gc, T: Trace + 'gc> {
    pub ptr: NonNull<GcBox<T>>,
    pub phantom: PhantomData<&'gc T>,
    pub rooted: Cell<bool>,
}

I call it automatic lifetimes because the methods never mix these struct lifetimes with the lifetime of the references they receive.

Here is the impl for gc_scope.alloc:

impl<'gc> GcScope<'gc> {
    // ...
    pub fn alloc<T: Trace + 'gc>(&self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
        // ...
    }
}

Inner/outer lifetimes

Implemented on the inner-outer branch

This implementation tries to fix the previous issue by relating Gc to the lifetime of GcScope. It is overly constrained and prevents the creation of cycles. This violates the last property.

To constrain Gc relative to its GcScope, I introduce two lifetimes: 'inner is the lifetime of GcScope and the result is Gc<'inner, T>. 'outer represents a lifetime longer than 'inner and is used for the allocated value.

Here is the alloc signature:

impl<'outer> GcScope<'outer> {
    // ...

    pub fn alloc<'inner, T: Trace + 'outer>(
        &'inner self,
        value: T,
    ) -> Result<Gc<'inner, T>, GcAllocErr> {
        // ...
    }

    // ...
}

Closure (context management)

Implemented on the with branch

Another idea was to not let the user create a GcScope manually with GcScope::new but instead expose a function GcScope::with(executor) providing a reference to the gc_scope. The closure executor corresponds to the gc_scope. So far, it either prevents the use of external references or allows to leak data to external Gc variables (first and fourth properties).

Here is the alloc signature:

impl<'gc> GcScope<'gc> {
    // ...
    pub fn alloc<T: Trace + 'gc>(&self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
        // ...
    }
}

Here is a usage example showing the violation of the first property:

let message = GcScope::with(|scope| {
    scope
        .alloc(NamedObject {
            name: String::from("Hello, World!"),
        })
        .unwrap()
});
println!("{}", message.name);

What I'd like

From what I understand, the alloc signature I'd like is:

impl<'gc> GcScope<'gc> {
    pub fn alloc<T: Trace + 'gc>(&'gc self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
        // ...
    }
}

Where everything lives as long or longer than self (the gc_scope). But this blows up with the most simple tests:

fn test_gc() {
    let scope: GcScope = GcScope::new();
    scope.alloc(String::from("Hello, World!")).unwrap();
}

causes

error[E0597]: `scope` does not live long enough
  --> src/test.rs:50:3
   |
50 |   scope.alloc(String::from("Hello, World!")).unwrap();
   |   ^^^^^ borrowed value does not live long enough
51 | }
   | - `scope` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

I have no idea what happens here. Playground link

Edit: As explained to me on IRC, this is because I implement Drop which requires &mut self, but the scope is already borrowed in read-only mode.

Overview

Here is a quick overview of the main components of my library. GcScope contains a RefCell to its mutable state. This was introduced to not require &mut self for alloc because it "locked" the gc_scope and violated property 3: allocate multiple values. This mutable state is GcState. It keeps track of all the allocated values. The values are stored as a forward-only linked list of GcBox. This GcBox is heap-allocated and contains the actual value with some metadata (how many active Gc pointers have it as their root and a boolean flag used to check if the value is reachable from the root (see rust-gc). The value here must outlive its gc_scope so GcBox uses a lifetime, and in turn GcState must then use a lifetime as well as GcScope: this is always the same lifetime meaning "longer than gc_scope". The fact that GcScope has a RefCell (interior mutability) and lifetime is maybe the reason why I can't get my lifetimes to work (it causes invariance?).

Gc is a smart pointer to some gc_scope-allocated data. You can only get it through gc_scope.alloc or by cloning it. GcRefCell is most likely fine, it's just a RefCell wrapper adding metadata and behavior to properly support borrows.

Flexibility

I'm fine with the following requirements to get a solution:

  • unsafe code
  • nightly features
  • API changes (see for example my with approach). What matters is that I can create a temporary zone where I can manipulate garbage-collected values and that they are all dropped after this. These garbage-collected values need to be able to access longer-lived (but not static) variables outside of the scope.

The repository has a few tests in scoped-gc/src/lib.rs (compile-fail) as scoped-gc/src/test.rs.

I found a solution, I'll post it once redacted.

like image 601
Demurgos Avatar asked Mar 08 '18 22:03

Demurgos


People also ask

How does Unity handle garbage collection?

The Garbage Collector GC handles this process for us by keeping track of which memory is still in use then releasing blocks when they become unused. GC in Unity pauses the execution of all your program code until it has finished all its work.

What causes garbage collection Unity?

Garbage in Unity is simply memory that doesn't need to be used anymore. It's typically caused by creating new instances of reference-type data inside a function, such as new lists, arrays and new instances of classes.

Is garbage collection automatic in JavaScript?

Some high-level languages, such as JavaScript, utilize a form of automatic memory management known as garbage collection (GC). The purpose of a garbage collector is to monitor memory allocation and determine when a block of allocated memory is no longer needed and reclaim it.

How does C# garbage collection work?

How GC works? GC works on managed heap, which is nothing but a block of memory to store objects, when garbage collection process is put in motion, it checks for dead objects and the objects which are no longer used, then it compacts the space of live object and tries to free more memory.


1 Answers

This is one of the hardest problems I had with lifetimes with Rust so far, but I managed to find a solution. Thank you to panicbit and mbrubeck for having helped me on IRC.

What helped me to move forward was the explanation of the error I posted at the end of my question:

error[E0597]: `scope` does not live long enough
  --> src/test.rs:50:3
   |
50 |   scope.alloc(String::from("Hello, World!")).unwrap();
   |   ^^^^^ borrowed value does not live long enough
51 | }
   | - `scope` dropped here while still borrowed
   |
   = note: values in a scope are dropped in the opposite order they are created

I did not understand this error because it wasn't clear to me why scope was borrowed, for how long, or why it needs to no longer be borrowed at the end of the scope.

The reason is that during the allocation of the value, the scope is immutably borrowed for the duration of the allocated value. The issue now is that the scope contains a state object that implements "Drop": custom implementations of drop use &mut self -> it is not possible to get a mutable borrow for the drop while the value is already immutably borrowed.

Understanding that drop requires &mut self and that it is incompatible with immutable borrows unlocked the situation.

It turns out that the inner-outer approach described in the question above had the correct lifetimes with alloc:

impl<'outer> GcScope<'outer> {
    // ...

    pub fn alloc<'inner, T: Trace + 'outer>(
        &'inner self,
        value: T,
    ) -> Result<Gc<'inner, T>, GcAllocErr> {
        // ...
    }

    // ...
}

The returned Gc lives as long as GcScope and the allocated values must live longer than the current GcScope. As mentioned in the question, the issue with this solution is that it did not support circular values.

The circular values failed to work not because of the lifetimes of alloc but due to the custom drop. Removing drop allowed all the tests to pass (but leaked memory).

The explanation is quite interesting:

The lifetime of alloc expresses the properties of the allocated values. The allocated values cannot outlive their GcScope but their content must live as long or longer than GcScope. When creating a cycle, the value is subject to both of these constraints: it is allocated so must live as long or shorter than GcScope but also referenced by another allocated value so it must live as long or longer than GcScope. Because of this there is only one solution: the allocated value must live exactly as long as its scope.

It means that the lifetime of GcScope and its allocated values is exactly the same. When two lifetimes are the same, Rust does not guarantee the order of the drops. The reason why this happens is that the drop implementations could try to access each other and since there's no ordering it would be unsafe (the value might already have been freed).

This is explained in the Drop Check chapter of the Rustonomicon.

In our case, the drop implementation of the state of the garbage collected does not dereference the allocated values (quite the opposite, it frees their memory) so the Rust compiler is overly cautious by preventing us from implementing drop.

Fortunately, the Nomicon also explains how to work around these check of values with the same lifetimes. The solution is to use the may_dangle attribute on the lifetime parameter of the Drop implementation. This is as unstable attribute that requires to enable the generic_param_attrs and dropck_eyepatch features.

Concretely, my drop implementation became:

unsafe impl<'gc> Drop for GcState<'gc> {
    fn drop(&mut self) {
        // Free all the values allocated in this scope
        // Might require changes to make sure there's no use after free
    }
}

And I added the following lines to lib.rs:

#![feature(generic_param_attrs)]
#![feature(dropck_eyepatch)]

You can read more about these features:

  • generic_param_attrs
  • may_dangle

I updated my library scoped-gc with the fix for this issue if you want to take a closer look at it.

like image 174
Demurgos Avatar answered Oct 11 '22 18:10

Demurgos