Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When are lock free data structures less performant than mutual exclusion (mutexes)?

I read somewhere (can't find the page anymore) that lock free data structures are more efficient "for certain workloads" which seems to imply that sometimes they're actually slower or the gain from them can be zero in some situations. Taking the ~100 cycle hit of a lock instruction to do an atomic op sounds plenty faster to me than going to sleep and waiting for the scheduler to wake the process back up, so it's not obvious to me under what circumstances a lock free data structure would be less preferable than old fashioned mutexes. If the lock is available 99% of the time and the process doesn't have to go to sleep, is a mutex then faster? Is there a good rule of the thumb for knowing which way to go assuming a suitable lock free data structure is available?

like image 541
Joseph Garvin Avatar asked Oct 18 '09 19:10

Joseph Garvin


People also ask

Is lock-free always faster?

Note that lock-free is not always faster because contention is complicated. Also note that lock-free can scale worse, again due to contention.

Are lock-free queues faster?

Lockfree queue is slower than using mutexes.

What are lock-free data structures?

A data structure provides lock-freedom if, at any time, at least one thread can proceed. All other threads may be starving. The difference to obstruction-freedom is that there is at least one non-starving thread even if no threads are suspended.

Are mutexes locks?

A Mutex is a lock that we set before using a shared resource and release after using it. When the lock is set, no other thread can access the locked region of code.


2 Answers

A common approach to implementing a lock-free data structure is to have a mutable reference to an immutable object, and have anything that wants to change the structure grab the reference, produce a new version of the object with suitable changes applied, and then CompareExchange the reference to point to the new object. If the CompareExchange works, great. If not, ditch the new object, re-grab the reference, and start over.

This can work well if producing the new object is cheap and the level of contention is low enough that the CompareExchange will usually work. If there is considerable contention, and if producing the new object is slow, simultaneous attempted updates by N threads may take N^2 time to complete. As an extreme example, suppose 100 threads are running on a CPU, an update takes 100ms of CPU time (just over a time-slice), and all 100 threads attempt to update an object at once. During the first ten seconds, each thread will produce a new object based on the original one. One of the threads will successfully do the CompareExchange, while the others will all fail. Then during the next 9.9 seconds, 99 threads will generate new versions of the object, after which one will successfully post its update and 98 will fail. The net effect will be that the lock-free method will take 505 seconds' worth of CPU time to perform 100 updates, when a locking system could have done them in about 10 seconds.

like image 153
supercat Avatar answered Oct 02 '22 16:10

supercat


lockless data structures will, one way or another, use atomic semantics from your architecture to perform its core operations. When you do this, you can be using the machines entire internal exclusion mechanisms to ensure correct ordering or fencing of data. A mutex or critical section also does this, but it only does it once for a single flag. Where the mutex or critical section is slow, is when the the lock acquisition fails (there is contention). In this case, the OS also invokes the scheduler to suspend the thread until the exclusion object has been released.

So it seems logical that whenever your lock-less data structure uses multiple atomic operations per core method when a single lock shielding a critical section could provide the same semantics AND there tends to be very little contention, in practice, for the data structure in question, then in fact, it does make more sense to use an OS-provided locking mechanism, than trying to build your own.

like image 43
Paul Hsieh Avatar answered Oct 02 '22 15:10

Paul Hsieh