I'm using a spin lock to protect a very small critical section. Contention happens very rarely so a spin lock is more appropriate than a regular mutex. My current code is as follows, and assumes x86 and GCC: <pre class="prettyprint"><code>volatile int exclusion = 0; void lock() { while (__sync_lock_test_and_set(&exclusion, 1)) { // Do nothing. This GCC builtin instruction // ensures memory barrier. } } void unlock() { __sync_synchronize(); // Memory barrier. exclusion = 0; } </code></pre> So I'm wondering: <ul> <li>Is this code correct? Does it correctly ensure mutual exclusion?</li> <li>Does it work on all x86 operating systems?</li> <li>Does it work on x86_64 too? On all operating systems?</li> <li>Is it optimal? <ul> <li>I've seen spin lock implementations using compare-and-swap but I'm not sure which is better.</li> <li>According to the GCC atomic builtins documentation (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html) there's also <code>__sync_lock_release</code>. I'm not an expert on memory barriers so I'm not sure whether it's okay for me to use this instead of <code>__sync_synchronize</code>.</li> <li>I'm optimizing for the case in which there's no contention.</li> </ul> </li> </ul> I do not care at all about contention. There may be 1, maybe 2 other threads trying to lock the spin lock once every few days.

Looks fine to me. Btw, here is the textbook implementation that is more efficient even in the contended case. <pre class="prettyprint"><code>void lock(volatile int *exclusion) { while (__sync_lock_test_and_set(exclusion, 1)) while (*exclusion) ; } </code></pre>

So I'm wondering: <pre class="prettyprint"><code>* Is it correct? </code></pre> In the context mentioned, I would say yes. <pre class="prettyprint"><code>* Is it optimal? </code></pre> That's a loaded question. By reinventing the wheel you are also reinventing a lot of problems that have been solved by other implementations <ul> <li>I'd expect a waste loop on failure where you aren't trying to access the lock word.</li> <li>Use of a full barrier in the unlock only needs to have release semantics (that's why you'd use __sync_lock_release, so that you'd get st1.rel on itanium instead of mf, or a lwsync on powerpc, ...). If you really only care about x86 or x86_64 the types of barriers used here or not don't matter as much (but if you where to make the jump to intel's itanium for an HP-IPF port then you wouldn't want this).</li> <li>you don't have the pause() instruction that you'd normally put before your waste loop.</li> <li>when there is contention you want something, semop, or even a dumb sleep in desperation. If you really need the performance that this buys you then the futex suggestion is probably a good one. If you need the performance this buys you bad enough to maintain this code you have a lot of research to do.</li> </ul> Note that there was a comment saying that the release barrier wasn't required. That isn't true even on x86 because the release barrier also serves as an instruction to the compiler to not shuffle other memory accesses around the "barrier". Very much like what you'd get if you used asm ("" ::: "memory" ). <pre class="prettyprint"><code>* on compare and swap </code></pre> On x86 the sync_lock_test_and_set will map to a xchg instruction which has an implied lock prefix. Definitely the most compact generated code (esp. if you use a byte for the "lock word" instead of an int), but no less correct than if you used LOCK CMPXCHG. Use of compare and swap can be used for fancier algorthims (like putting a non-zero pointer to metadata for the first "waiter" into the lockword on failure).

Is my spin lock implementation correct and optimal?

Tags:

c

multithreading

concurrency

mutual-exclusion

spinlock

I'm using a spin lock to protect a very small critical section. Contention happens very rarely so a spin lock is more appropriate than a regular mutex.

My current code is as follows, and assumes x86 and GCC:

volatile int exclusion = 0;  void lock() {     while (__sync_lock_test_and_set(&exclusion, 1)) {         // Do nothing. This GCC builtin instruction         // ensures memory barrier.     } }  void unlock() {     __sync_synchronize(); // Memory barrier.     exclusion = 0; }

So I'm wondering:

Is this code correct? Does it correctly ensure mutual exclusion?
Does it work on all x86 operating systems?
Does it work on x86_64 too? On all operating systems?
Is it optimal?
- I've seen spin lock implementations using compare-and-swap but I'm not sure which is better.
- According to the GCC atomic builtins documentation (http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html) there's also __sync_lock_release. I'm not an expert on memory barriers so I'm not sure whether it's okay for me to use this instead of __sync_synchronize.
- I'm optimizing for the case in which there's no contention.

I do not care at all about contention. There may be 1, maybe 2 other threads trying to lock the spin lock once every few days.

726

asked Sep 05 '09 13:09

Hongli

2 Answers

Looks fine to me. Btw, here is the textbook implementation that is more efficient even in the contended case.

void lock(volatile int *exclusion) {     while (__sync_lock_test_and_set(exclusion, 1))         while (*exclusion)             ; }

125

answered Sep 20 '22 11:09

sigjuice

So I'm wondering:

* Is it correct?

In the context mentioned, I would say yes.

* Is it optimal?

That's a loaded question. By reinventing the wheel you are also reinventing a lot of problems that have been solved by other implementations

I'd expect a waste loop on failure where you aren't trying to access the lock word.
Use of a full barrier in the unlock only needs to have release semantics (that's why you'd use __sync_lock_release, so that you'd get st1.rel on itanium instead of mf, or a lwsync on powerpc, ...). If you really only care about x86 or x86_64 the types of barriers used here or not don't matter as much (but if you where to make the jump to intel's itanium for an HP-IPF port then you wouldn't want this).
you don't have the pause() instruction that you'd normally put before your waste loop.
when there is contention you want something, semop, or even a dumb sleep in desperation. If you really need the performance that this buys you then the futex suggestion is probably a good one. If you need the performance this buys you bad enough to maintain this code you have a lot of research to do.

Note that there was a comment saying that the release barrier wasn't required. That isn't true even on x86 because the release barrier also serves as an instruction to the compiler to not shuffle other memory accesses around the "barrier". Very much like what you'd get if you used asm ("" ::: "memory" ).

* on compare and swap

On x86 the sync_lock_test_and_set will map to a xchg instruction which has an implied lock prefix. Definitely the most compact generated code (esp. if you use a byte for the "lock word" instead of an int), but no less correct than if you used LOCK CMPXCHG. Use of compare and swap can be used for fancier algorthims (like putting a non-zero pointer to metadata for the first "waiter" into the lockword on failure).

answered Sep 24 '22 11:09

Peeter Joot

Related questions
                            
                                How to check whether memory bandwidth has became a bottleneck?
                            
                                A good serial communications protocol/stack for embedded devices? [closed]
                            
                                Cross compiling static C hello world for Android using arm-linux-gnueabi-gcc
                            
                                GTK implementation of MessageBox
                            
                                Argument-parsing helpers for C/Unix
                            
                                Why do C compilers prepend underscores to external names?
                            
                                Is there a C pre-processor which eliminates #ifdef blocks based on values defined/undefined?
                            
                                crt0.o and crt1.o -- What's the difference?
                            
                                Understanding how .Internal C functions are handled in R
                            
                                When extending a padded struct, why can't extra fields be placed in the tail padding?
                            
                                Static functions declared in "C" header files
                            
                                Why does it make a difference if left and right shift are used together in one expression or not?
                            
                                When was the NULL macro not 0?
                            
                                difference between c99 and c11 [closed]
                            
                                Pinging from a C/C++ program
                            
                                Why does rand() % 7 always return 0?
                            
                                warning: assignment discards qualifiers from pointer target type
                            
                                What is the NDEBUG preprocessor macro used for (on different platforms)?
                            
                                Is "char foo = 255" undefined behavior if char is signed?
                            
                                How to create a lightweight C code sandbox?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With