The example implementation Wikipedia provides for a spinlock with the x86 XCHG command is:
; Intel syntax
locked: ; The lock variable. 1 = locked, 0 = unlocked.
dd 0
spin_lock:
mov eax, 1 ; Set the EAX register to 1.
xchg eax, [locked] ; Atomically swap the EAX register with
; the lock variable.
; This will always store 1 to the lock, leaving
; the previous value in the EAX register.
test eax, eax ; Test EAX with itself. Among other things, this will
; set the processor's Zero Flag if EAX is 0.
; If EAX is 0, then the lock was unlocked and
; we just locked it.
; Otherwise, EAX is 1 and we didn't acquire the lock.
jnz spin_lock ; Jump back to the MOV instruction if the Zero Flag is
; not set; the lock was previously locked, and so
; we need to spin until it becomes unlocked.
ret ; The lock has been acquired, return to the calling
; function.
spin_unlock:
mov eax, 0 ; Set the EAX register to 0.
xchg eax, [locked] ; Atomically swap the EAX register with
; the lock variable.
ret ; The lock has been released.
from here https://en.wikipedia.org/wiki/Spinlock#Example_implementation
What I don't understand is why the unlock would need to be atomic. What's wrong with
spin_unlock:
mov [locked], 0
The unlock does need to have release semantics to properly protect the critical section. But it doesn't need sequential-consistency. Atomicity isn't really the issue (see below).
So yes, on x86 a simple store is safe, and glibc's pthread_spin_unlock
does so::
movl $1, (%rdi)
xorl %eax, %eax
retq
See also a simple but maybe usable x86 spinlock implementation I wrote in this answer, using a read-only spin loop with a pause
instruction.
Possibly this code was adapted from a bit-field version.
Unlocking with btr
to zero one flag in a bitfield isn't safe, because it's a non-atomic read-modify-write of the containing byte (or the containing naturally-aligned 4 byte dword or 2 byte word).
So maybe whoever wrote it didn't realize that simple stores to aligned addresses are atomic on x86, like on most ISAs. But what x86 has that weakly-ordered ISAs don't is that every store has release semantics. An xchg
to release the lock makes every unlock a full memory barrier, which goes beyond normal locking semantics. (Although on x86, taking a lock will be a full barrirer, because there's no way to do an atomic RMW or atomic compare-and-swap without an xchg
or other lock
ed instruction, and those are full barriers like mfence
.)
The unlocking store doesn't technically need to be atomic, since we only ever store zero or 1, so only the lower byte matters. e.g. I think it would still work if the lock was unaligned and split across a cache-line boundary. Tearing can happen but doesn't matter, and what's really happening is that the low byte of the lock is modified atomically, with operations that always put zeros into the upper 3 bytes.
If you wanted to return the old value to catch double-unlocking bugs, a better implementation would separately load and store:
spin_unlock:
;; pre-condition: [locked] is non-zero
mov eax, [locked] ; old value, for debugging
mov dword [locked], 0 ; On x86, this is an atomic store with "release" semantics.
;test eax,eax
;jz double_unlocking_detected ; or leave this to the caller
ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With