Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread-safe data structure design

I have to design a data structure that is to be used in a multi-threaded environment. The basic API is simple: insert element, remove element, retrieve element, check that element exists. The structure's implementation uses implicit locking to guarantee the atomicity of a single API call. After i implemented this it became apparent, that what i really need is atomicity across several API calls. For example if a caller needs to check the existence of an element before trying to insert it he can't do that atomically even if each single API call is atomic:

if(!data_structure.exists(element)) {
   data_structure.insert(element);
}

The example is somewhat awkward, but the basic point is that we can't trust the result of "exists" call anymore after we return from atomic context (the generated assembly clearly shows a minor chance of context switch between the two calls).

What i currently have in mind to solve this is exposing the lock through the data structure's public API. This way clients will have to explicitly lock things, but at least they won't have to create their own locks. Is there a better commonly-known solution to these kinds of problems? And as long as we're at it, can you advise some good literature on thread-safe design?

EDIT: I have a better example. Suppose that element retrieval returns either a reference or a pointer to the stored element and not it's copy. How can a caller be protected to safely use this pointer\reference after the call returns? If you think that not returning copies is a problem, then think about deep copies, i.e. objects that should also copy another objects they point to internally.

Thank you.

like image 839
Inso Reiges Avatar asked Mar 19 '10 10:03

Inso Reiges


2 Answers

You either provide a mechanism for external locking (bad), or redesign the API, like putIfAbsent. The latter approach is for instance used for Java's concurrent data-structures.

And, when it comes to such basic collection types, you should check-out whether your language of choice already offers them in its standard library.

[edit]To clarify: external locking is bad for the user of the class, as it introduces another source of potential bugs. Yes, there are times, when performance considerations indeed make matters worse for concurrent data-structures than externally synchronized one, but those cases are rare, and then they usually can only be solved/optimized by people with far more knowledge/experience than me.

One, maybe important, performance hint is found in Will's answer below. [/edit]

[edit2]Given your new example: Basically you should try to keep the synchronization of the collection and the of the elements separated as much as possible. If the lifetime of the elements is bound to its presence in one collection, you will run into problems; when using a GC this kind of problem actually becomes simpler. Otherwise you will have to use a kind of proxy instead of raw elements to be in the collection; in the simplest case for C++ you would go and use boost::shared_ptr, which uses an atomic ref-count. Insert usual performance disclaimer here. When you are using C++ (as I suspect as you talk about pointers and references), the combination of boost::shared_ptr and boost::make_shared should suffice for a while. [/edit2]

like image 132
gimpf Avatar answered Sep 23 '22 09:09

gimpf


Sometimes its expensive to create an element to be inserted. In these scenarios you can't really afford to routinely create objects that might already exist just in case they do.

One approach is for the insertIfAbsent() method to return a 'cursor' that is locked - it inserts a place-holder into the internal structure so that no other thread can believe it is absent, but does not insert the new object. The placeholder can contain a lock so that other threads that want to access that particular element must wait for it to be inserted.

In an RAII language like C++ you can use a smart stack class to encapsulate the returned cursor so that it automatically rolls-back if the calling code does not commit. In Java its a bit more deferred with the finalize() method, but can still work.

Another approach is for the caller to create the object that isn't present, but that to occasionally fail in the actual insertion if another thread has 'won the race'. This is how, for example, memcache updates are done. It can work very well.

like image 44
Will Avatar answered Sep 25 '22 09:09

Will