Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I organize multithreaded access to a graph?

I'm elaborating over a problem which seems hard to me and I'm not expecting a simple solution, but maybe there are proven practices or further reading that might make this easier. I'm pretty sure that the general problem pops up in many applications (for example garbage collection or transactional databases).

My application has a graph (a DAG if it matters) which is being traversed by multiple threads simultaneously. Some of these are just trying to find certain nodes or retrieve a subgraph, others may change the graph's structure.

The policy I want to implement is that a reading thread will perform its entire operation on a "snapshot" of the graph, i. e. see the structure as it was at a certain point in time.

My current plan is to set up something similar to row versioning in transactional DBs, i. e. a reading thread first obtains a current version number and then only visits graph nodes and edges that have this version number or earlier. Writing threads would then put an incremented version number on new elements (changed elements would be cloned first), making them invisible for running readers. A writing thread can then "commit" its new version when it has successfully finished, and readers would "release" their version number, making deleted elements eligible for removal.

This strategy is still sketchy and has a number of unsolved issues such as concurrent write access, but generally it seems like a viable road to go.

like image 548
Hanno Fietz Avatar asked Jul 21 '09 14:07

Hanno Fietz


2 Answers

an alternative would be using Persistent Data Structures. They are data structure which always preserves the previous version of itself when it is modified.

They are like a log file, the modified version is always appendend last making them immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. Programming languages like Clojure lately polularized this approach (at least to me).

like image 121
dfa Avatar answered Nov 02 '22 05:11

dfa


Your approach seems sound to me ...

What you might have problems with however is ensuring causality between writes on separate subgraphs.

If a writer alters subgraph A and another writer alters distinct subgraph B, but other read/write operations happen on subgraph C where A and B are in C then you need to ensure that the version of subgraph C aligns correctly with the versions of B and A.

I would suggest a locking scheme in the DAG that incorporates subgraph locking for multiple read/single write from a given root. You will need to grep the graph for cyclic dependencies however to ensure that you don't get into a starvation/deadlock state within the graph.

If your graph is distributed or your concurrent access has latency then your transaction system will be harder to implement and will likely require additional safeguards.

Your versioning approach sounds fine providing your locking provisions ensure, in all cases that a set of revisions for nodes at any time T represent an integral state of the graph. The set of nodes and revisions at T = {n0, n1, n2, n3} and a concurrent revision bumping of subgraphs will pose the headache in keeping the entire set of revisions and nodes at T integral.

As dfa suggests above, the set of nodes and revisions at some point represents a changeset of the entire structure. Providing it has integrity, this represents the entire structure at some point in time.

Remember: Time, Integrity, Causality and Concurrency

Good Luck!

like image 40
Aiden Bell Avatar answered Nov 02 '22 05:11

Aiden Bell