Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does concurrency using immutable/persistent types and data structures work?

I've read a lot about functional languages recently. Since those use only immutable structures they claim that concurrency issues are greatly improved/solved. I'm having some serious trouble understanding how this can actually be delivered in a real-life context. Let's assume we have a web server with one thread listening on a port (well, IO is another thing i have difficulty wrapping my head around but let's just ignore that for now); On any connection attempt a socket is created and passed to a newly created thread which does some work with it and, depending on the received communications, may apply changes to a big list/data structure which is global to the server application. So, how does this list access work then in order for all threads having a consistent view of the list (or at least in order to have all changes made by one thread applied to the list as soon as the thread dies in a correct manner)?

My problems understanding are:

  • Obviously any thread can get a unchangeable "snapshot" of the list to work on. However, after "changing" the contents by creating a new version of the list with the changes applied, we are still left with every thread having their own version of the list. How are those merged back together?
  • Another method might consist of using traditional locking mechanisms like mutex/cond or go-like-channels. However, how would you even create such a thing when all variables are immutable?
  • I've heard about STM, however that cannot deal with side effects (i.e. if the list would also transparently backup the data to a file or db)

So how would you model such a thing in a functional language?

like image 943
Askaga Avatar asked Jun 03 '13 07:06

Askaga


People also ask

What is the use of immutable data structures?

Immutable data structures are usually designed so that common operations can be performed quite quickly. When a change is made, a new version of the data structure is created by referring to the old version, or part of it, and adding information about what changed.

What is an immutable data structure?

The structure of the immutable data cannot be changed. The data would be the original one and once the object is created we can not modify its state in an immutable object.

Why are immutable data structures good?

Immutable data structures provides referential transparency which makes it easier to reason about our program locally. Another way to think about it is that every time we execute a pure (referentially transparent) function with the same input, we get the same output.

What is persistent and ephemeral data structure explain in detail?

∎ An ephemeral data structure is one for. which only one version is available at a time: after an update operation, the structure as it existed before the update is lost. ∎ A persistent structure is one where. multiple versions are simultaneously accessible: after an update, both old and new versions can be used.


1 Answers

Immutable values have several applications for which they are well suited. The concurrent/parallel processing is just one of them that got more important recently. The following is really the most basic digest from experience and many books and talks about the subject. You may need to dive into some eventually.

The main example you show here is about managing global state, so it cannot be done purely "immutably". However, even here there are very good reasons to use immutable data structures. Some of those from top of my head:

  • try - catch behaves much better, because you do not modify shared object that may be left modified half-way, with immutable values, it automatically keeps the last consistent state
  • reducing changing state to just multicore safe "compare-and-swap" operations on very limited set of global variables (ideally one), completely eliminates deadlocks
  • free passing of data structures without any defensive copying that is quite often case of mysterious bugs when forgotten (many times defensive copies are created in both calling and called functions, because developers start to incline to "better safe than sorry" after a couple of debugging sessions)
  • much easier unit testing, because many functions operating on immutable values are side-effect free
  • usually easier serialization and more transparent comparison semantics much easier debugging and taking (logging) a current snapshot of the system state even asynchronously

Back to your question though.

In the most trivial case the global state in this case is often modelled using one mutable reference at the top holding onto an immutable data structure.

The reference is updated only by CAS atomic operation.

The immutable data structure is transformed by a side-effect free functions and when all transformations are done the reference is swapped atomically.

If two threads/cores want to swap simultaneously new values got from the same old one, the one doing that first wins the other does not succeed (CAS semantics) and needs to repeat the operation (depends on the transformation, either updating the current one with the new value, or transforming the new value from the beginning). This may seem wasteful, but the assumption here is that redoing some work is often cheaper than permanent locking/synchronization overhead.

Of course this can be optimized e.g. by partitioning independent parts of immutable data structures to further reduce potential collisions by having several references being updated independently.

Access to the data structure is lock-free and very fast and always gives a consistent response. Edge cases like when you send an update and another client receives older data afterwards is to be expected in any system, because of network requests can get out of order too...

STM is useful quite rarely and usually you are better of to use atomic swaps of data structure containing all values from references you would use in STM transaction.

like image 110
jJ' Avatar answered Oct 19 '22 09:10

jJ'