Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare and swap in machine code in C

How would you write a function in C which does an atomic compare and swap on an integer value, using embedded machine code (assuming, say, x86 architecture)? Can it be any more specific if its written only for the i7 processor?

Does the translation act as a memory fence, or does it just ensure ordering relation just on that memory location included in the compare and swap? How costly is it compared to a memory fence?

Thank you.

like image 608
axel22 Avatar asked Nov 18 '10 10:11

axel22


4 Answers

The easiest way to do it is probably with a compiler intrinsic like _InterlockedCompareExchange(). It looks like a function but is actually a special case in the compiler that boils down to a single machine op. In the case of the MSVC x86 intrinsic, that works as a read/write fence as well, but that's not necessarily true on other platforms. (For example, on the PowerPC, you'd need to explicitly issue a lwsync to fence memory reordering.)

In general, on many common systems, a compare-and-swap operation usually only enforces an atomic transaction upon the one address it's touching. Other memory access can be reordered, and in multicore systems, memory addresses other than the one you've swapped may not be coherent between the cores.

like image 71
Crashworks Avatar answered Oct 29 '22 08:10

Crashworks


You can use the CMPXCHG instruction with the LOCK prefix for atomic execution.

E.g.

lock cmpxchg DWORD PTR [ebx], edx

or

lock cmpxchgl %edx, (%ebx)

This compares the value in the EAX register with the value at the address stored in the EBX register and stores the value in the EDX register to that location if they are the same, otherwise it loads the value at the address stored in the EBX register into EAX.

You need to have a 486 or later for this instruction to be available.

like image 41
CB Bailey Avatar answered Oct 29 '22 08:10

CB Bailey


If your integer value is 64 bit than use cmpxchg8b 8 byte compare and exchange under IA32 x86. Variable must be 8 byte aligned.

Example:
      mov   eax, OldDataA           //load Old first 32 bits
      mov   edx, OldDataB           //load Old second 32 bits
      mov   ebx, NewDataA           //load first 32 bits
      mov   ecx, NewDataB           //load second 32 bits
      mov   edi, Destination        //load destination pointer
      lock cmpxchg8b qword ptr [edi]
      setz  al                      //if transfer is succesful the al is 1 else 0
like image 24
GJ. Avatar answered Oct 29 '22 09:10

GJ.


If the LOCK prefix is omitted in atomic processor instructions, atomic operation across multiprocessor environment will not be guaranteed.

In a multiprocessor environment, the LOCK# signal ensures that the processor has exclusive use of any shared memory while the signal is asserted. Intel Instruction Set Reference

Without LOCK prefix the operation will guarantee not being interrupted by any event (interrupt) on current processor/core only.

like image 41
bkausbk Avatar answered Oct 29 '22 09:10

bkausbk