Is memory ordering in C++11 about main memory flush ordering?

Tags:

I'm not sure i fully understand (and i may have all wrong) the concepts of atomicity and memory ordering in C++11. Let's take this simple example single threaded :

int main()
{
    std::atomic<int> a(0);
    std::atomic<int> b(0);
    a.store(16);
    b.store(10);

    return 0;
}

In this single threaded code, if a and b were not atomic types, the compiler could have reordered the instruction in a way that in the assembly code, i have for instance a move instruction to assigned 10 to 'b' before a move instruction to assigned 16 to 'a'. So for me, being atomic variables, it guarantees me that i'd have the "a move instruction" before the "b move instruction" as i stated in my source code. After that, there is the processor with his execution unit, prefetching instructions, and with his out-of-order box. And this processor can process the "b instruction" before the "a instruction", whatever is the instruction ordering in the assembly code. So i can have 10 stored in a register or in the store buffer of a processor or in cache memory before i have 16 stored in a register / store buffer or in cache.

And with my understanding, it's where memory ordering model come out. From that moment, if i let the default model sequentially consistent. One guarantees me that flush out these values (10 and 16) in main memory will respect the order i did the store in my source code. So that the processor will start flushing out the register or cache where 16 is stored into main memory for update 'a' and after that it will flush 10 in the main memory for 'b'.

So that behavior does allow me to understand that if i use a relaxed memory model. Only the last part is not guarantee so that the flush in main memory can be in total disorder.

Sorry if you get trouble to read me, my english is still poor. But thank you guys for your time.

708

asked Apr 16 '15 08:04

jedib

2 Answers

The C++ memory model is about the abstract machine and value visibility, not about concrete things like "main memory", "write queues" or "flushing".

In your example, the memory model states that since the write to a happens-before the write to b, any thread that reads the 10 from b must, on subsequent reads from a, see 16 (unless this has since been overwritten, of course).

The important thing here is establishing happens-before relationships and value visibility. How this maps to caches and memory is up to the compiler. In my opinion, it's better to stay on that abstract level instead of trying to map the model to your understanding of the hardware, because

Your understanding of the hardware might be wrong. Hardware is even more complicated than the C++ memory model.
Even if your understanding is correct now, a later version of the hardware might have a different model, at least in subsystems.
By mapping to a hardware model, you might then make wrong assumptions about the implications for a different hardware model. E.g. if you understand how the memory model maps to x86 hardware, you will not understand the subtle difference between consume and acquire on PowerPC.
The C++ model is very well suited for reasoning about correctness.

141

answered Sep 21 '22 14:09

Sebastian Redl

You didn't specify which architecture you work with, but basically each has its own memory ordering model (some times more than one that you can choose from), and that serves as a "contract". The compiler should be aware of that and use lightweight or heavyweight instructions accordingly to guarantee what it needs in order to provide the memory model of the language.

The HW implementation under the hood can be quite complicated, but in a nutshell - you don't need to flush in order to get global visibility. Modern cache systems provide snooping capabilities, so that a value can be globally visible and globally ordered while still residing in some private core cache (and having stale copies in lower cache levels), the MESI protocols control how this is handled correctly.

The life cycle of a write begins in the out of order engine, where it is still speculative (i.e. - can be cleared due to an older branch misprediction or fault). Naturally, during that time the write can not be seen from the outside, so out-of-order execution here is not relevant. Once it commits, if the system guarantees store ordering (like x86), it still has to wait in line for its turn to become visible, so it is buffered. Other cores can't see it since its observation time hasn't reached yet (although local loads in that core might see it in some implementations of x86 - that's one of the differences between TSO and real sequential consistency). Once the older stores are done, the store may become globally visible - it doesn't have to go anywhere outside of the core for that, it can remain cached internally. In fact, some CPUs may even make it observable while still in the store buffer, or write it to the cache speculatively - the actual decision point is when to make it respond to external snoops, the rest is implementation details. Architectures with more relaxed ordering may change the order unless explicitly blocked by a fence/barrier.

Based on that, your code snippet can not reorder stores on x86 since stores don't reorder with each other there, but it may be able to do so on arm for example. If the language requires strong ordering in that case, the compiler will have to decide if it can rely on the HW, or add a fence. Either way, anyone reading this value from another thread (or socket) will have to snoop for it, and can only see the writes that respond.

answered Sep 18 '22 14:09

Leeor

Related questions
                            
                                Build ZBar for 64bit WIndows
                            
                                How do multimaps internally handle duplicate keys?
                            
                                Should {tp_alloc, tp_dealloc} and {tp_new, tp_free} be considered as pairs?
                            
                                How can I manually compile Cython code that uses C++?
                            
                                C++ stringstream not working correctly after updated with stringstream::str()
                            
                                Issues applying std::bind recursively on a std::function
                            
                                Is calling asio io_service poll() or poll_one() in a nested or recursive fashion (ie. within a handler) valid?
                            
                                does NewDirectByteBuffer create a copy in native code
                            
                                How to get the number of items of a QTreeWidget
                            
                                Initialization and lambda-type argument
                            
                                Safety of static_cast to pointer-to-derived class from base destructor
                            
                                C++ - Invalid initialization of non-const reference of type
                            
                                Convert array of uint8_t to string in C++
                            
                                Undefined reference to `__cxa_thread_atexit@@CXXABI` when compiling with `libc++` on linux
                            
                                SDL 2.0 Key repeat and delay
                            
                                Is using "this" in contructor's initialization list specificly dangerous with Qt?
                            
                                Avoid template instantiation for different char array sizes
                            
                                Undefined symbols for architecture x86_64 for Boost C++
                            
                                Why can't I use a constexpr pointer as template parameter in C++11?
                            
                                Autogenerated move constructors causing illegal behavior

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is memory ordering in C++11 about main memory flush ordering?

Tags:

c++

memory-model

atomic

stdatomic

memory-barriers

jedib

People also ask

2 Answers

Sebastian Redl

Leeor

Recent Activity

Donate For Us