Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

taking a snapshot of complex mutable structure in concurrent environment

Tags:

clojure

Given: a complex structure of various nested collections, with refs scattered in different levels.

Need: A way to take a snapshot of such a structure, while allowing writes to continue to happen in other threads.

So one the "reader" thread needs to read whole complex state in a single long transaction. The "writer" thread meanwhile makes modifications in multiple short transactions. As far as I understand, in such a case STM engine utilizes the refs history.

Here we have some interesting results. E.g., reader reaches some ref in 10 secs after beginning of transaction. Writer modifies this ref each 1 sec. It results in 10 values of ref's history. If it exceeds the ref's :max-history limit, the reader transaction will be run forever. If it exceeds :min-history, transaction may be rerun several times.

But really the reader needs just a single value of ref (the 1st one) and the writer needs just the recent one. All intermediate values in history list are useless. Is there a way to avoid such history overuse?

Thanks.

like image 731
Xanth Avatar asked Nov 14 '22 12:11

Xanth


1 Answers

To me it's a bit of a "design smell" to have a large structure with lots of nested refs. You are effectively emulating a mutable object graph, which is a bad idea if you believe Rich Hickey's take on concurrency.

Some various thoughts to try out:

  • The idiomatic way to solve this problem in Clojure would be to put the state in a single top-level ref, with everything inside it being immutable. Then the reader can take a snapshot of the entire concurrent state for free (without even needing a transaction). Might be difficult to refactor to this from where you currently are, but I'd say it is best practice.
  • If you only want the reader to get a snaphot of the top level ref, you can just deref it directly outside of a transaction. Just be aware that the refs inside may continue to get mutated, so whether this is useful or not depends on the consistency requirements you have for the reader.
  • You can do everything within a (dosync...) transaction as normal for both readers and writer. You may get contention and transaction retries, but it may not be an issue.
  • You can create a "snapshot" function that quickly traverses the graph and dereferences all the refs within a transaction, returning the result with the refs stripped out (or replaced by new cloned refs). The reader calls snapshot once, then continues to do the rest of it's work after the snapshot is completed.
  • You could take a snapshot immediately each time after the writer finishes, and store it separately in an atom. Readers can use this directly (i.e. only the writer thread accesses the live data graph directly)
like image 142
mikera Avatar answered Dec 30 '22 22:12

mikera