Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is memory ordering in C++11 about main memory flush ordering?

I'm not sure i fully understand (and i may have all wrong) the concepts of atomicity and memory ordering in C++11. Let's take this simple example single threaded :

int main()
{
    std::atomic<int> a(0);
    std::atomic<int> b(0);
    a.store(16);
    b.store(10);

    return 0;
}

In this single threaded code, if a and b were not atomic types, the compiler could have reordered the instruction in a way that in the assembly code, i have for instance a move instruction to assigned 10 to 'b' before a move instruction to assigned 16 to 'a'. So for me, being atomic variables, it guarantees me that i'd have the "a move instruction" before the "b move instruction" as i stated in my source code. After that, there is the processor with his execution unit, prefetching instructions, and with his out-of-order box. And this processor can process the "b instruction" before the "a instruction", whatever is the instruction ordering in the assembly code. So i can have 10 stored in a register or in the store buffer of a processor or in cache memory before i have 16 stored in a register / store buffer or in cache.

And with my understanding, it's where memory ordering model come out. From that moment, if i let the default model sequentially consistent. One guarantees me that flush out these values (10 and 16) in main memory will respect the order i did the store in my source code. So that the processor will start flushing out the register or cache where 16 is stored into main memory for update 'a' and after that it will flush 10 in the main memory for 'b'.

So that behavior does allow me to understand that if i use a relaxed memory model. Only the last part is not guarantee so that the flush in main memory can be in total disorder.

Sorry if you get trouble to read me, my english is still poor. But thank you guys for your time.

like image 708
jedib Avatar asked Apr 16 '15 08:04

jedib


People also ask

What is memory model in C ++ 11?

C++11 Memory Model. A memory model, a.k.a memory consistency model, is a specification of the allowed behavior of multithreaded programs executing with shared memory [1].

What is memory order in C++?

(since C++20) std::memory_order specifies how memory accesses, including regular, non-atomic memory accesses, are to be ordered around an atomic operation.

What is Memory_order_seq_cst?

Essentially memory_order_acq_rel provides read and write orderings relative to the atomic variable, while memory_order_seq_cst provides read and write ordering globally. That is, the sequentially consistent operations are visible in the same order across all threads.

What is a relaxed Atomic?

Relaxed. Relaxed accesses are the absolute weakest. They can be freely re-ordered and provide no happens-before relationship. Still, relaxed operations are still atomic. That is, they don't count as data accesses and any read-modify-write operations done to them occur atomically.


2 Answers

The C++ memory model is about the abstract machine and value visibility, not about concrete things like "main memory", "write queues" or "flushing".

In your example, the memory model states that since the write to a happens-before the write to b, any thread that reads the 10 from b must, on subsequent reads from a, see 16 (unless this has since been overwritten, of course).

The important thing here is establishing happens-before relationships and value visibility. How this maps to caches and memory is up to the compiler. In my opinion, it's better to stay on that abstract level instead of trying to map the model to your understanding of the hardware, because

  • Your understanding of the hardware might be wrong. Hardware is even more complicated than the C++ memory model.
  • Even if your understanding is correct now, a later version of the hardware might have a different model, at least in subsystems.
  • By mapping to a hardware model, you might then make wrong assumptions about the implications for a different hardware model. E.g. if you understand how the memory model maps to x86 hardware, you will not understand the subtle difference between consume and acquire on PowerPC.
  • The C++ model is very well suited for reasoning about correctness.
like image 141
Sebastian Redl Avatar answered Sep 21 '22 14:09

Sebastian Redl


You didn't specify which architecture you work with, but basically each has its own memory ordering model (some times more than one that you can choose from), and that serves as a "contract". The compiler should be aware of that and use lightweight or heavyweight instructions accordingly to guarantee what it needs in order to provide the memory model of the language.

The HW implementation under the hood can be quite complicated, but in a nutshell - you don't need to flush in order to get global visibility. Modern cache systems provide snooping capabilities, so that a value can be globally visible and globally ordered while still residing in some private core cache (and having stale copies in lower cache levels), the MESI protocols control how this is handled correctly.

The life cycle of a write begins in the out of order engine, where it is still speculative (i.e. - can be cleared due to an older branch misprediction or fault). Naturally, during that time the write can not be seen from the outside, so out-of-order execution here is not relevant. Once it commits, if the system guarantees store ordering (like x86), it still has to wait in line for its turn to become visible, so it is buffered. Other cores can't see it since its observation time hasn't reached yet (although local loads in that core might see it in some implementations of x86 - that's one of the differences between TSO and real sequential consistency). Once the older stores are done, the store may become globally visible - it doesn't have to go anywhere outside of the core for that, it can remain cached internally. In fact, some CPUs may even make it observable while still in the store buffer, or write it to the cache speculatively - the actual decision point is when to make it respond to external snoops, the rest is implementation details. Architectures with more relaxed ordering may change the order unless explicitly blocked by a fence/barrier.

Based on that, your code snippet can not reorder stores on x86 since stores don't reorder with each other there, but it may be able to do so on arm for example. If the language requires strong ordering in that case, the compiler will have to decide if it can rely on the HW, or add a fence. Either way, anyone reading this value from another thread (or socket) will have to snoop for it, and can only see the writes that respond.

like image 35
Leeor Avatar answered Sep 18 '22 14:09

Leeor