I'm working with a friend to define a safe public API for lifetimes of a "scoped" garbage collector. The lifetimes are either overly constrained and correct code does not compile or the lifetimes are too loose and they may allow invalid behavior. After trying multiple approaches, we are still stuck getting a correct API. This is especially frustrating because Rust's lifetimes can help avoid bugs in this situation but right now it just looks stubborn.
I am implementing an ActionScript interpreter and need a garbage collector. I studied rust-gc but it did not suit my needs. The main reason is that it requires the garbage collected values to have a static lifetime because the GC state is a thread-local static variable. I need to get garbage-collected bindings to a dynamically created host object. The other reason to avoid globals is that it is easier for me to handle multiple independent garbage-collected scopes, control their memory limits or serialize them.
A scoped garbage collector is similar to a typed-arena. You can use it to allocate values and they are all freed once the garbage collector is dropped. The difference is that you can also trigger garbage collection during its lifetime and it will clean-up the unreachable data (and is not limited to a single type).
I have a working implementation implemented (mark & sweep GC with scopes), but the interface is not yet safe to use.
Here is a usage example of what I want:
pub struct RefNamedObject<'a> {
pub name: &'a str,
pub other: Option<Gc<'a, GcRefCell<NamedObject<'a>>>>,
}
fn main() {
// Initialize host settings: in our case the host object will be replaced by a string
// In this case it lives for the duration of `main`
let host = String::from("HostConfig");
{
// Create the garbage-collected scope (similar usage to `TypedArena`)
let gc_scope = GcScope::new();
// Allocate a garbage-collected string: returns a smart pointer `Gc` for this data
let a: Gc<String> = gc_scope.alloc(String::from("a")).unwrap();
{
let b = gc_scope.alloc(String::from("b")).unwrap();
}
// Manually trigger garbage collection: will free b's memory
gc_scope.collect_garbage();
// Allocate data and get a Gc pointer, data references `host`
let host_binding: Gc<RefNamed> = gc_scope
.alloc(RefNamedObject {
name: &host,
other: None,
})
.unwrap();
// At the end of this block, gc_scope is dropped with all its
// remaining values (`a` and `host_bindings`)
}
}
The basic intuition is that Gc
can only contain data that lives as long (or longer) than the corresponding GcScope
. Gc
is similar to Rc
but supports cycles. You need to use Gc<GcRefCell<T>>
to mutate values (similar to Rc<RefCell<T>>
).
Here are the properties that must be satisfied by the lifetimes of my API:
Gc
cannot live longer than its GcScope
The following code must fail because a
outlives gc_scope
:
let a: Gc<String>;
{
let gc_scope = GcScope::new();
a = gc_scope.alloc(String::from("a")).unwrap();
}
// This must fail: the gc_scope was dropped with all its values
println("{}", *a); // Invalid
Gc
cannot contain data that lives shorter than its GcScope
The following code must fail because msg
does not live as long (or longer) as gc_scope
let gc_scope = GcScope::new();
let a: Gc<&string>;
{
let msg = String::from("msg");
a = gc.alloc(&msg).unwrap();
}
Gc
(no exclusion on gc_scope
)The following code must compile
let gc_scope = GcScope::new();
let a = gc_scope.alloc(String::from("a"));
let b = gc_scope.alloc(String::from("b"));
gc_scope
The following code must compile
let msg = String::from("msg");
let gc_scope = GcScope::new();
let a: Gc<&str> = gc_scope.alloc(&msg).unwrap();
Similarly to the Rc<Refcell<T>>
pattern, you can use Gc<GcRefCell<T>>
to mutate values and create cycles:
// The lifetimes correspond to my best solution so far, they can change
struct CircularObj<'a> {
pub other: Option<Gc<'a, GcRefCell<CircularObj<'a>>>>,
}
let gc_scope = GcScope::new();
let n1 = gc_scope.alloc(GcRefCell::new(CircularObj { other: None }));
let n2 = gc_scope.alloc(GcRefCell::new(CircularObj {
other: Some(Gc::clone(&n1)),
}));
n1.borrow_mut().other = Some(Gc::clone(&n2));
Implemented on the auto-lifetime
branch
This solution is inspired by neon
's handles.
This lets any valid code compile (and allowed me to test my implementation) but is too loose and allows invalid code. It allows Gc
to outlive the gc_scope
that created it. (Violates the first property)
The idea here is that I add a single lifetime 'gc
to all my structs. The idea is that this lifetime represents "how long gc_scope lives".
// A smart pointer for `T` valid during `'gc`
pub struct Gc<'gc, T: Trace + 'gc> {
pub ptr: NonNull<GcBox<T>>,
pub phantom: PhantomData<&'gc T>,
pub rooted: Cell<bool>,
}
I call it automatic lifetimes because the methods never mix these struct lifetimes with the lifetime of the references they receive.
Here is the impl for gc_scope.alloc:
impl<'gc> GcScope<'gc> {
// ...
pub fn alloc<T: Trace + 'gc>(&self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Implemented on the inner-outer
branch
This implementation tries to fix the previous issue by relating Gc
to the lifetime of GcScope
. It is overly constrained and prevents the creation of cycles. This violates the last property.
To constrain Gc
relative to its GcScope
, I introduce two lifetimes: 'inner
is the lifetime of GcScope
and the result is Gc<'inner, T>
. 'outer
represents a lifetime longer than 'inner
and is used for the allocated value.
Here is the alloc signature:
impl<'outer> GcScope<'outer> {
// ...
pub fn alloc<'inner, T: Trace + 'outer>(
&'inner self,
value: T,
) -> Result<Gc<'inner, T>, GcAllocErr> {
// ...
}
// ...
}
Implemented on the with
branch
Another idea was to not let the user create a GcScope
manually with GcScope::new
but instead expose a function GcScope::with(executor)
providing a reference to the gc_scope
. The closure executor
corresponds to the gc_scope
. So far, it either prevents the use of external references or allows to leak data to external Gc
variables (first and fourth properties).
Here is the alloc signature:
impl<'gc> GcScope<'gc> {
// ...
pub fn alloc<T: Trace + 'gc>(&self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Here is a usage example showing the violation of the first property:
let message = GcScope::with(|scope| {
scope
.alloc(NamedObject {
name: String::from("Hello, World!"),
})
.unwrap()
});
println!("{}", message.name);
From what I understand, the alloc
signature I'd like is:
impl<'gc> GcScope<'gc> {
pub fn alloc<T: Trace + 'gc>(&'gc self, value: T) -> Result<Gc<'gc, T>, GcAllocErr> {
// ...
}
}
Where everything lives as long or longer than self
(the gc_scope
). But this blows up with the most simple tests:
fn test_gc() {
let scope: GcScope = GcScope::new();
scope.alloc(String::from("Hello, World!")).unwrap();
}
causes
error[E0597]: `scope` does not live long enough
--> src/test.rs:50:3
|
50 | scope.alloc(String::from("Hello, World!")).unwrap();
| ^^^^^ borrowed value does not live long enough
51 | }
| - `scope` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
I have no idea what happens here. Playground link
Edit: As explained to me on IRC, this is because I implement Drop
which requires &mut self
, but the scope
is already borrowed in read-only mode.
Here is a quick overview of the main components of my library.
GcScope
contains a RefCell
to its mutable state. This was introduced to not require &mut self
for alloc
because it "locked" the gc_scope and violated property 3: allocate multiple values.
This mutable state is GcState
. It keeps track of all the allocated values. The values are stored as a forward-only linked list of GcBox
. This GcBox
is heap-allocated and contains the actual value with some metadata (how many active Gc
pointers have it as their root and a boolean flag used to check if the value is reachable from the root (see rust-gc). The value here must outlive its gc_scope
so GcBox
uses a lifetime, and in turn GcState
must then use a lifetime as well as GcScope
: this is always the same lifetime meaning "longer than gc_scope
". The fact that GcScope
has a RefCell
(interior mutability) and lifetime is maybe the reason why I can't get my lifetimes to work (it causes invariance?).
Gc
is a smart pointer to some gc_scope
-allocated data. You can only get it through gc_scope.alloc
or by cloning it.
GcRefCell
is most likely fine, it's just a RefCell
wrapper adding metadata and behavior to properly support borrows.
I'm fine with the following requirements to get a solution:
with
approach). What matters is that I can create a temporary zone where I can manipulate garbage-collected values and that they are all dropped after this. These garbage-collected values need to be able to access longer-lived (but not static) variables outside of the scope.The repository has a few tests in scoped-gc/src/lib.rs
(compile-fail) as scoped-gc/src/test.rs
.
I found a solution, I'll post it once redacted.
The Garbage Collector GC handles this process for us by keeping track of which memory is still in use then releasing blocks when they become unused. GC in Unity pauses the execution of all your program code until it has finished all its work.
Garbage in Unity is simply memory that doesn't need to be used anymore. It's typically caused by creating new instances of reference-type data inside a function, such as new lists, arrays and new instances of classes.
Some high-level languages, such as JavaScript, utilize a form of automatic memory management known as garbage collection (GC). The purpose of a garbage collector is to monitor memory allocation and determine when a block of allocated memory is no longer needed and reclaim it.
How GC works? GC works on managed heap, which is nothing but a block of memory to store objects, when garbage collection process is put in motion, it checks for dead objects and the objects which are no longer used, then it compacts the space of live object and tries to free more memory.
This is one of the hardest problems I had with lifetimes with Rust so far, but I managed to find a solution. Thank you to panicbit and mbrubeck for having helped me on IRC.
What helped me to move forward was the explanation of the error I posted at the end of my question:
error[E0597]: `scope` does not live long enough
--> src/test.rs:50:3
|
50 | scope.alloc(String::from("Hello, World!")).unwrap();
| ^^^^^ borrowed value does not live long enough
51 | }
| - `scope` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
I did not understand this error because it wasn't clear to me why scope
was borrowed, for how long, or why it needs to no longer be borrowed at the end of the scope.
The reason is that during the allocation of the value, the scope
is immutably borrowed for the duration of the allocated value. The issue now is that the scope contains a state object that implements "Drop": custom implementations of drop
use &mut self
-> it is not possible to get a mutable borrow for the drop while the value is already immutably borrowed.
Understanding that drop requires &mut self
and that it is incompatible with immutable borrows unlocked the situation.
It turns out that the inner-outer approach described in the question above had the correct lifetimes with alloc
:
impl<'outer> GcScope<'outer> {
// ...
pub fn alloc<'inner, T: Trace + 'outer>(
&'inner self,
value: T,
) -> Result<Gc<'inner, T>, GcAllocErr> {
// ...
}
// ...
}
The returned Gc
lives as long as GcScope
and the allocated values must live longer than the current GcScope
. As mentioned in the question, the issue with this solution is that it did not support circular values.
The circular values failed to work not because of the lifetimes of alloc
but due to the custom drop
. Removing drop
allowed all the tests to pass (but leaked memory).
The explanation is quite interesting:
The lifetime of alloc
expresses the properties of the allocated values. The allocated values cannot outlive their GcScope
but their content must live as long or longer than GcScope
. When creating a cycle, the value is subject to both of these constraints: it is allocated so must live as long or shorter than GcScope
but also referenced by another allocated value so it must live as long or longer than GcScope
. Because of this there is only one solution: the allocated value must live exactly as long as its scope.
It means that the lifetime of GcScope
and its allocated values is exactly the same. When two lifetimes are the same, Rust does not guarantee the order of the drops. The reason why this happens is that the drop
implementations could try to access each other and since there's no ordering it would be unsafe (the value might already have been freed).
This is explained in the Drop Check chapter of the Rustonomicon.
In our case, the drop
implementation of the state of the garbage collected does not dereference the allocated values (quite the opposite, it frees their memory) so the Rust compiler is overly cautious by preventing us from implementing drop
.
Fortunately, the Nomicon also explains how to work around these check of values with the same lifetimes. The solution is to use the may_dangle
attribute on the lifetime parameter of the Drop
implementation.
This is as unstable attribute that requires to enable the generic_param_attrs
and dropck_eyepatch
features.
Concretely, my drop
implementation became:
unsafe impl<'gc> Drop for GcState<'gc> {
fn drop(&mut self) {
// Free all the values allocated in this scope
// Might require changes to make sure there's no use after free
}
}
And I added the following lines to lib.rs
:
#![feature(generic_param_attrs)]
#![feature(dropck_eyepatch)]
You can read more about these features:
generic_param_attrs
may_dangle
I updated my library scoped-gc with the fix for this issue if you want to take a closer look at it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With