Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spinlocks, How Useful Are They?

People also ask

Why are spinlocks useful?

SpinLock might be useful when a lock on a shared resource is not going to be held for very long. In such cases, on multi-core computers it can be efficient for the blocked thread to spin for a few cycles until the lock is released. By spinning, the thread does not become blocked, which is a CPU-intensive process.

Are spinlocks good for latency?

From a latency perspective, this spinlock is as good as it gets. We only execute one atomic operation, and there is no way we can do better than this. Regarding delay, this implementation could perform well.

Is SpinLock good?

There is no wastage of cpu cycles in using spinlocks on uni processor systems, because once a process takes a spin lock , preemption is disabled , so as such, there could be no one else spinning! It's just that using it doesn't make any sense!


It depends on what you're doing. In general application code, you'll want to avoid spinlocks.

In low-level stuff where you'll only hold the lock for a couple of instructions, and latency is important, a spinlock mat be a better solution than a lock. But those cases are rare, especially in the kind of applications where C# is typically used.


In C#, "Spin locks" have been, in my experience, almost always worse than taking a lock - it's a rare occurrence where spin locks will outperform a lock.

However, that's not always the case. .NET 4 is adding a System.Threading.SpinLock structure. This provides benefits in situations where a lock is held for a very short time, and being grabbed repeatedly. From the MSDN docs on Data Structures for Parallel Programming:

In scenarios where the wait for the lock is expected to be short, SpinLock offers better performance than other forms of locking.

Spin locks can outperform other locking mechanisms in cases where you're doing something like locking through a tree - if you're only having locks on each node for a very, very short period of time, they can out perform a traditional lock. I ran into this in a rendering engine with a multithreaded scene update, at one point - spin locks profiled out to outperform locking with Monitor.Enter.


For my realtime work, particularly with device drivers, I've used them a fair bit. It turns out that (when last I timed this) waiting for a sync object like a semaphore tied to a hardware interrupt chews up at least 20 microseconds, no matter how long it actually takes for the interrupt to occur. A single check of a memory-mapped hardware register, followed by a check to RDTSC (to allow for a time-out so you don't lock up the machine) is in the high nannosecond range (basicly down in the noise). For hardware-level handshaking that shouldn't take much time at all, it is really tough to beat a spinlock.


My 2c: If your updates satisfy some access criteria then they are good spinlock candidates:

  • fast, ie you will have time to acquire the spinlock, perform the updates and release the spinlock in a single thread quanta so that you don't get pre-empted while holding the spinlock
  • localized all data you update are in preferably one single page that is already loaded, you do not want a TLB miss while you holding the spinlock, and you definetely don't want an page fault swap read!
  • atomic you do not need any other lock to perform the operation, ie. never wait for locks under spinlock.

For anything that has any potential to yield, you should use a notified lock structure (events, mutex, semaphores etc).


One use case for spin locks is if you expect very low contention but are going to have a lot of them. If you don't need support for recursive locking, a spinlock can be implemented in a single byte, and if contention is very low then the CPU cycle waste is negligible.

For a practical use case, I often have arrays of thousands of elements, where updates to different elements of the array can safely happen in parallel. The odds of two threads trying to update the same element at the same time are very small (low contention) but I need one lock for every element (I'm going to have a lot of them). In these cases, I usually allocate an array of ubytes of the same size as the array I'm updating in parallel and implement spinlocks inline as (in the D programming language):

while(!atomicCasUbyte(spinLocks[i], 0, 1)) {}
    myArray[i] = newVal;
atomicSetUbyte(spinLocks[i], 0);

On the other hand, if I had to use regular locks, I would have to allocate an array of pointers to Objects, and then allocate a Mutex object for each element of this array. In scenarios such as the one described above, this is just plain wasteful.