How would you write a function in C which does an atomic compare and swap on an integer value, using embedded machine code (assuming, say, x86 architecture)? Can it be any more specific if its written only for the i7 processor?
Does the translation act as a memory fence, or does it just ensure ordering relation just on that memory location included in the compare and swap? How costly is it compared to a memory fence?
Thank you.
The easiest way to do it is probably with a compiler intrinsic like _InterlockedCompareExchange(). It looks like a function but is actually a special case in the compiler that boils down to a single machine op. In the case of the MSVC x86 intrinsic, that works as a read/write fence as well, but that's not necessarily true on other platforms. (For example, on the PowerPC, you'd need to explicitly issue a lwsync to fence memory reordering.)
In general, on many common systems, a compare-and-swap operation usually only enforces an atomic transaction upon the one address it's touching. Other memory access can be reordered, and in multicore systems, memory addresses other than the one you've swapped may not be coherent between the cores.
You can use the CMPXCHG
instruction with the LOCK
prefix for atomic execution.
E.g.
lock cmpxchg DWORD PTR [ebx], edx
or
lock cmpxchgl %edx, (%ebx)
This compares the value in the EAX register with the value at the address stored in the EBX register and stores the value in the EDX register to that location if they are the same, otherwise it loads the value at the address stored in the EBX register into EAX.
You need to have a 486 or later for this instruction to be available.
If your integer value is 64 bit than use cmpxchg8b 8 byte compare and exchange under IA32 x86. Variable must be 8 byte aligned.
Example:
mov eax, OldDataA //load Old first 32 bits
mov edx, OldDataB //load Old second 32 bits
mov ebx, NewDataA //load first 32 bits
mov ecx, NewDataB //load second 32 bits
mov edi, Destination //load destination pointer
lock cmpxchg8b qword ptr [edi]
setz al //if transfer is succesful the al is 1 else 0
If the LOCK prefix is omitted in atomic processor instructions, atomic operation across multiprocessor environment will not be guaranteed.
In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted. Intel Instruction Set Reference
Without LOCK prefix the operation will guarantee not being interrupted by any event (interrupt) on current processor/core only.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With