Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an existing solution for these particular multithreaded data structure requirements?

I've had the need for a multi-threaded data structure that supports these claims:

  • Allows multiple concurrent readers and writers
  • Is sorted
  • Is easy to reason about

Fulfilling multiple readers and one writer is a lot easier, but I really would wan't to allow multiple writers.

I've been doing research into this area, and I'm aware of ConcurrentSkipList (by Lea based on work by Fraser and Harris) as it's implemented in Java SE 6. I've also implemented my own version of a concurrent Skip List based on A Provably Correct Scalable Concurrent Skip List by Herlihy, Lev, Luchangco and Shavit.

These two implementations are developed by people that are light years smarter then me, but I still (somewhat ashamed, because it is amazing work) have to ask the question if these are the two only viable implementations of a concurrent multi reader/writer data structures available today?

like image 814
thr Avatar asked Dec 06 '09 20:12

thr


2 Answers

Sounds to me like your making this problem too hard for yourself. Consider the following:

  • Its pretty easy to implement immutable versions of many data structures, especially trees. Immutable data structures have the benefit that, by virtue of being immutable, one thread can't modify the collection under another threads nose. Immutability = no race conditions = no locks = no deadlocking. Awesomeness.

    See Okasaki's Purely Functional Data Structures, which provides ML and Haskell implementations of heaps, balanced trees, stacks, queues, and some other data structures.

  • Threads can't see changes to an immutable data structure made in other threads. They can, however, notify one another explicitly of changes using message-passing concurrency.

Locks and mutexes are too low-level, and mutable state is pretty much the enemy of multithreaded programming. If you think about whatever problem your trying to solve in terms immutability and message passing, then it'll become 1000x easier for you.

like image 151
Juliet Avatar answered Oct 21 '22 13:10

Juliet


Well, you already identified the one I would typically suggest - the concurrent skiplist, but the absence of other specific requirements other than the three above, I think a simple linked list with per-node mutexes will work:

Each node contains one element (or a reference), and one simple mutex. I'll assume Java here, because it helps avoid race conditions around node reclamation.

Searching through the list consists of iterating through the nodes from the head, no need to get any locks (although you'll need to ensure visibility between threads at some point - you can choose how frequently this happens - once per search may be good enough).

To add an item, do a search until you find the immediate predecessor and successor nodes for the value you want to insert, lock the mutex associated with the prior node, check the successor node again (it might have changed out from under you), then splice in the new node.

Deletion works similarly - find the predecessor node for the node you want to delete, lock the predecessor node, check that it is still the predecessor node, and take it out of the tree.

This structure is sorted (to the extent that it is correct - I haven't proven that!).

This structure clearly allows multiple readers (readers are never blocked for any reason), and multiple writers, although writers trying to manipulate the same portion of the list (e.g., two threads inserting nodes which have the same splice point) will wait on each other.

The structure seems relatively easy to reason about - being based on a single linked list, with a fairly simple locking structure and with a few simple invariants. I haven't spend more than a few minutes reasoning about its correctness, however. You can make it even simpler to reason about, at the cost of performance, by making the locking policy more heavyweight - for insertion and deletion lock each node during iteration and then lock the successor before unlocking the prior node - in this way you'll have both nodes locked when you find the splice or deletion point, so no "double-check, backtrack" is needed.

You may be able to get rid of the locks completely and use a lock-free list while maintaining your "must be sorted" condition, but I haven't thought about it in depth - at a minimum I suspect it will be "harder to reason about".

In C++ the structure is more involved, since you can't rely on the GC to keep the nodes around as long as readers might be looking at them, so the simple strategy of allowing the readers to mosey around in a lock-free manner doesn't fly, if you ever want to delete nodes. You can adjust it by making readers take the lock of each visited node, but this sucks for both performance (obvious) and concurrency (because although you can have multiple readers and writers in some basic way, they can never pass each other anymore, so practical concurrency is heavily limited).

An alternative would be to use rwlocks in each node, readers taking read locks only and inserters/deleters taking read locks to find the position to work on, then upgrading to write. Currency is at least improved since readers can pass readers, but writers still block readers (so all readers that are iterating at a position before some node on which a write operation starts will not be able to iterate past that node until the operation completes).

Now, all that said (whew!) I should mention that other that "easy to reason about" this structure seems pretty much inferior to the concurrent skip list in every material way, with the possible exception of slightly less memory usage (maybe). It doesn't have the log(N) search behavior, in particular. It is not fully lock-free (writers may wait on writers in some cases). I'm not even sure it is that much easier to reason about as far as concurrency goes, although the underlying structure is simple.

I think if you really wanted you could extend this kind of "per-node" locking to something like an rb-tree if you wanted, but it not easy. In particular, you need to find some kind of node to lock prior to any tree rotations that is "high enough" that you can be sure that any thread wanting to perform a modification that will affect the correctness of your rotation will also try to lock the same node. In the linked list, this is the "predecessor node" - in an AVL or RB-tree it is not so simple.

like image 42
BeeOnRope Avatar answered Oct 21 '22 11:10

BeeOnRope