What are the benefits of using a specifically designed spinlock (e.g. http://anki3d.org/spinlock) vs. code like this: <pre class="prettyprint"><code>std::mutex m; while (!m.try_lock()) {} # do work m.unlock(); </code></pre>

On typical hardware, there are massive benefits: <ol> <li>Your naive "fake spinlock" may saturate internal CPU buses while the CPU spins, starving other physical cores including the physical core that holds the lock.</li> <li>If the CPU supports hyper-threading or something similar, your naive "fake spinlock" may consume excessive execution resources on the physical core, starving another thread sharing that physical core.</li> <li>Your naive "fake spinlock" probably does extraneous write operations that result in bad cache behavior. When you perform a read-modify-write operation on an x86/x86_64 CPU (like the compare/exchange that try_lock probably does), it always writes even if the value isn't changed. This write causes the cache line to be invalidated on other cores, requiring them to re-share it when another core accesses that line. This is awful if threads on other cores contend for the same lock at the same time.</li> <li>Your naive "fake spinlock" interacts badly with branch prediction. When you finally do get the lock, you take the mother of all mispredicted branches right at the point where you are locking out other threads and need to execute as quickly as possible. This is like a runner being all pumped up and ready to run at the starting line but then when he hears the starting pistol, he stops to catch his breath.</li> </ol> Basically, that code does everything wrong that it is possible for a spinlock to do wrong. Absolutely nothing is done efficiently. Writing good synchronization primitives requires deep hardware expertise.

Spinlock vs std::mutex::try_lock

Tags:

c++

mutex

spinlock

What are the benefits of using a specifically designed spinlock (e.g. http://anki3d.org/spinlock) vs. code like this:

std::mutex m;
while (!m.try_lock()) {}
# do work
m.unlock();

915

asked Feb 11 '16 05:02

user2411693

2 Answers

On typical hardware, there are massive benefits:

Your naive "fake spinlock" may saturate internal CPU buses while the CPU spins, starving other physical cores including the physical core that holds the lock.
If the CPU supports hyper-threading or something similar, your naive "fake spinlock" may consume excessive execution resources on the physical core, starving another thread sharing that physical core.
Your naive "fake spinlock" probably does extraneous write operations that result in bad cache behavior. When you perform a read-modify-write operation on an x86/x86_64 CPU (like the compare/exchange that try_lock probably does), it always writes even if the value isn't changed. This write causes the cache line to be invalidated on other cores, requiring them to re-share it when another core accesses that line. This is awful if threads on other cores contend for the same lock at the same time.
Your naive "fake spinlock" interacts badly with branch prediction. When you finally do get the lock, you take the mother of all mispredicted branches right at the point where you are locking out other threads and need to execute as quickly as possible. This is like a runner being all pumped up and ready to run at the starting line but then when he hears the starting pistol, he stops to catch his breath.

Basically, that code does everything wrong that it is possible for a spinlock to do wrong. Absolutely nothing is done efficiently. Writing good synchronization primitives requires deep hardware expertise.

176

answered Sep 28 '22 00:09

David Schwartz

The main benefit of using a spinlock is that it is extremely cheap to acquire and release if the all-important precondition is true: There is little or no congestion on the lock.

If you know with sufficient certitude that there will be no contention, a spinlock will greatly outperform a naive implementation of a mutex which will go through library code doing validations that you don't necessarily need, and do a syscall. This means doing a context switch (consuming several hundreds of cycles), and abandoning the thread's time slice and causing your thread to be rescheduled. This may take an indefinite time -- even if the lock would be available almost immediately afterwards, you can still have to wait several dozen milliseconds before your thread runs again in unfavorable conditions.

If, however, the precondition of no contention does not hold, a spinlock will usually be vastly inferior as it makes no progress, but it still consumes CPU resources as if it was performing work. When blocking on a mutex, your thread does not consume CPU resources, so these can be used for a different thread to do work, or the CPU may throttle down, saving power. That's not possible with a spinlock, which is doing "active work" until it succeeds (or fails).
In the worst case, if the number of waiters is greater than the number of CPU cores, spinlocks may cause huge, dysproportionate performance impacts because the threads that are active and running are waiting on a condition that can never happen while they are running (since releasing the lock requires a different thread to run!).

On the other hand, one should expect every modern no-suck implementation of std::mutex to already include a tiny spinlock before falling back to doing a syscall. But... while it is a reasonable assumption, this is not guaranteed.

Another non-technical reason for using spinlocks in favor of a std::mutex may be license terms. License terms are a poor rationale for a design decision, but they may nevertheless be very real.
For example, the present GCC implementation is based exclusively on pthreads, which implies that "anything MinGW" using anything from the standard threads library necessarily links with winpthreads (lacking alternatives). That means you are subject to the winpthreads license, which implies you must reproduce their copyright message. For some people, that's a dealbreaker.

answered Sep 28 '22 02:09

Damon

Related questions
                            
                                Setting C++ compile flags in xcode
                            
                                Try Catch block in destructor
                            
                                c# How to use the new Version Helper API
                            
                                Do if(){ } while() statement
                            
                                What are the kernel coefficients for OpenCV's Sobel filter for sizes larger than 3 x 3?
                            
                                Is it safe to use members initialized in the `: , ` portion of a constructor later in `: , `? [duplicate]
                            
                                CMake source_group() not working correctly with hierarchical project setup
                            
                                std::function/bind like type-erasure without Standard C++ library
                            
                                chrono steady_clock not giving correct result?
                            
                                Is expression inside decltype executed, or just being checked for validation?
                            
                                What data structure to find number of elements in a given range in O(log n) time?
                            
                                Why is there no wait function for condition_variable which does not relock the mutex
                            
                                Grouped QComboBox
                            
                                Setting up ROS package in CLion
                            
                                How do I use while true in threads?
                            
                                Win32: Is it possible to show the window but to hide it from taskbar?
                            
                                How to track down "libc++abi.dylib: Pure virtual function called!" in Xcode
                            
                                Invalid use of incomplete type 'class Ui::dialog (QT error )
                            
                                How to compile telegram jni folder
                            
                                Qt5: How to hide or remove a QMenu from the QMenuBar?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With