When to use Test&Set or Test&Test&Set?

Question

Parallel programming under x86 can be hard job especially under multi-core CPU. Let say that we have multi-core x86 CPU and more different multithread communication combinations.

Single writer and single reader
Single reader multiple writers
Multiple readers and single writer
Multiple readers and multiple writers

So which one model is better (more efficient) for locking shared memory region: Test&Set or Test&Test&Set and when to use it!

Here I have two simple (no time limited) test procedures written in under Delphi IDE in x86 assembler:

procedure TestAndSet(const oldValue, newValue: cardinal; var destination);
asm
//eax = oldValue
//edx = NewLockValue
//ecx = destination = 32 bit pointer on lock variable 4 byte aligned
@RepeatSpinLoop:
        push    eax                   //Save lock oldValue (compared)
        pause                         //CPU spin-loop hint
        lock    cmpxchg dword ptr [ecx], edx
        pop     eax                   //Restore eax as oldValue
        jnz     @RepeatSpinLoop       //Repeat if cmpxchg wasn't successful
end;

procedure TestAndTestAndSet(const oldValue, newValue: cardinal; var destination);
asm
//eax = oldValue
//edx = NewLockValue
//ecx = destination = 32 bit pointer on lock variable 4 byte aligned
@RepeatSpinLoop:
        push    eax                   //Save lock oldValue (compared)
@SpinLoop:
        pause                         //CPU spin-loop hint
        cmp     dword ptr [ecx], eax  //Test betfore test&set
        jnz     @SpinLoop
        lock    cmpxchg dword ptr [ecx], edx
        pop     eax                   //Restore eax as oldValue
        jnz     @RepeatSpinLoop       //Repeat if cmpxchg wasn't successful
end;

EDIT:

Intel in documentation mention two approach Test&Set or Test&Test&Set. I' wont to establish in which case is someone approach better, so when to use it. Check: Intel

Despatcher · Accepted Answer

Surely the first (testAndSet) is better because the 2nd does not achieve much with repeating the test using cmp & jnz - in between. While you are doing this the destination value may change anyway as it is not locked.

Andras Vass · Answer

TTAS (#2) is good practice. "Lurking" and waiting for the "opportunity" before doing CAS is common practice in both Java and .NET concurrent classes. With that said, cmpxchg received quite a lot of optimizations in the last few years, so it might be possible that you'd get nearly identical results on the latest crop of processors.

What you should try in both cases, however is to employ some exponential backoff when you spin.

Update

@GJ: You should find some more up-to-date documentation on Intel's site. Note the paragraph about not locking the bus since 486 and the comparison chart of xchg and cmpxchg that shows that they are practically identical.

Spinning on a read vs on a locked instruction will still be a good idea to avoid some contention on getting the cache line in exclusive mode. (So TTAS.)

However this will provide a useful gain only if you implement e.g. an exponential back-off, even yielding the CPU after a while.

The differences between TTAS and TAS, or w/o backoff would be smaller if you are using a single, modern multi-core CPU with a shared L3 cache between the cores and would become more visible if you are using a multi-socket - e.g. server - machine or a multi-core CPU that has no shared cache between the cores. They would also be different based on the amount of contention. (I.e. light load would see smaller difference between TTAS/TAS.)

When to use Test&Set or Test&Test&Set?

Tags:

x86

multithreading

parallel-processing

delphi

GJ.

2 Answers

Despatcher

Andras Vass

Recent Activity

Donate For Us

When to use Test&Set or Test&Test&Set?

Tags:

x86

multithreading

parallel-processing

delphi

GJ.

2 Answers

Despatcher

Andras Vass

Related questions

Recent Activity

Donate For Us