Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare and Swap : synchronizing via different data sizes

Tags:

c++

gcc

atomic

Using the GCC builtin C atomic primitives, we can perform an atomic CAS operation using __atomic_compare_exchange.

Unlike C++11's std::atomic type, the GCC C atomic primitives operate on regular non-atomic integral types, including 128-bit integers on platforms where cmpxchg16b is supported. (A future version of the C++ standard may support similar functionality with the std::atomic_view class template.)

This makes me question:

What happens if an atomic CAS operation on a larger data size observes a change which happened by an atomic operation on the same memory location, but using a smaller data size?

For example, suppose we have:

struct uint128_type {
  uint64_t x;
  uint64_t y;
} __attribute__ ((aligned (16)));

And suppose we have a shared variable of type uint128_type, like:

uint128_type Foo;

Now, suppose Thread A does:

    Foo expected = { 0, 0 };
    Foo desired = { 100, 100 };
    int result = __atomic_compare_exchange(
        &Foo, 
        &expected, 
        &desired, 
        0, 
        __ATOMIC_SEQ_CST
    );

And Thread B does:

    uint64_t expected = 0;
    uint64_t desired = 500;
    int result = __atomic_compare_exchange(
        &Foo.x, 
        &expected, 
        &desired, 
        0, 
        __ATOMIC_SEQ_CST
    );

What happens if Thread A's 16-byte CAS happens before Thread B's 8 byte CAS (or vice-versa)? Does the CAS fail as normal? Is this even defined behavior? Is this likely to "do the right thing" on typical architectures like x86_64 that support 16b CAS?

Edit: to be clear, since it seems to be causing confusion, I'm not asking if the above behavior is defined by the C++ standard. Obviously, all the __atomic_* functions are GCC extensions. (However future C++ standards may have to define this sort of thing, if std::atomic_view becomes standardized.) I am asking more generally about the semantics of atomic operations on typical modern hardware. As an example, if x86_64 code has 2 threads perform atomic operations on the same memory address, but one thread uses CMPXCHG8b and the other uses CMPXCHG16b, so that one does an atomic CAS on a single word while the other does an atomic CAS on a double word, how are the semantics of these operations defined? More specifically, would a CMPXCHG16b fail because it observes that the data has mutated from the expected value due to a previous CMPXCHG8b?

In other words, can two different CAS operations using two different data sizes (but the same starting memory address) safely be used to synchronize between threads?

like image 838
Siler Avatar asked Apr 15 '16 23:04

Siler


2 Answers

One or the other happens first, and each operation proceeds according to its own semantics.

On x86 CPUs, both operations will require a lock on the same cache line held throughout the entire operation. So whichever one gets that lock first will not see any effects of the second operation and whichever one gets that lock second will see all the effects of the first operation. The semantics of both operations will be fully respected.

Other hardware may achieve this result some other way, but if it doesn't achieve this result, it's broken unless it specifies that it has a limitation.

like image 56
David Schwartz Avatar answered Oct 12 '22 11:10

David Schwartz


The atomic data will, eventually, located somewhere in the memory and all accesses to it (or to respective caches, when the operations are atomic) will be serialized. Since the CAS operation is supposed to be atomic, it will performed as a whole or not at all.

That being said, one of the operations will succeed, the second will fail. The order is non-deterministic.

From x86 Instruction Set Reference:

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

Clearly, both threads will attempt to locked-write after locked-read (when used with LOCK prefix, that is), which means only one of them will succeed in performing the CAS, the other will read already changed value.

like image 32
Honza Remeš Avatar answered Oct 12 '22 12:10

Honza Remeš