Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

concurrent garbage collection for a c++ graph data structure

I have a directed graph data structure used for audio signal processing (see http://audulus.com if you're curious).

I would like graph edges to be strong references, so in the absence of cycles, std::shared_ptr would do the trick. Alas, there are potentially cycles in the graph.

So, I had this idea for a simple concurrent mark-sweep collector:

The mutator thread sends events to the collector thread. The collector thread maintains its own representation of the graph and does not traverse the mutator thread's graph. The collector thread just uses mark-sweep at regular intervals.

The events would be the following (in function call form):

  • AddRoot(Node*)
  • RemoveRoot(Node*)
  • AddEdge(Node*, Node*)
  • RemoveEdge(Node*, Node*)

Is this scheme correct? The collector thread has an older version of what the mutator thread sees. My intuition is that since a node that is unreachable at an earlier time will still be unreachable at a later time, the collector thread may delete an unreachable object as soon as it finds one.

Also, if it's correct for one mutator thread, would it work for multiple mutator threads?

UPDATE

I've released the code here: https://github.com/audulus/collector. The code is actually fairly general purpose. Use RootPtr<T> to automatically keep track of root nodes. Links between nodes are managed using EdgePtr<T>.

The collector seems to work for multiple mutator threads (both in my app and in unit tests), but I feel like a proof of correctness is needed.

PLEASE NOTE (in repsonse to @AaronGolden's comment below, judging from the comments below, people aren't reading this): The mutator thread is responsible for calling the collector functions in the correct order. For example, if the mutator thread calls RemoveEdge(a,b) before assigning b to a RootPtr, the collector may intervene and collect b.

UPDATE 2:

I've updated the code to my latest version and updated the link above. I've now used the code in my app for over a year and haven't attributed any bugs to it.

like image 614
Taylor Avatar asked Jun 14 '13 19:06

Taylor


1 Answers

One argument I think is somewhat persuasive (though I would hesitate to call it proof) that this scheme works is that in the absence of cycles, the scheme is equivalent to reference counting with atomic reference counts.

In the absence of cycles, AddRoot and AddEdge map to incrementing a reference count and RemoveRoot and RemoveEdge map to decrementing. Pushing an event onto the queue (I use boost::lockfree::queue) is an atomic operation just like the updating reference counts.

So then the remaining question is: how do cycles change the picture in terms of correctness? To wave hands a bit, cycles are a property of the connectivity of the graph, but don't have an effect on the atomicity of the operations or the ability of one thread to know something earlier than it would otherwise (causing a potential bad ordering of operations).

This would suggest that if there's a counterexample for the scheme, it will involve playing some game with cycles.

like image 67
Taylor Avatar answered Nov 05 '22 03:11

Taylor