Linux's synchronization primitives (spinlock, mutex, RCUs) use memory barrier instructions to force the memory access instructions from getting re-ordered. And this reordering can be done either by the CPU itself or by the compiler.
Can someone show some examples of GCC produced code where such reordering is done ? I am interested mainly in x86. The reason I am asking this is to understand how GCC decides what instructions can be reordered. Different x86 mirco architectures (for ex: sandy bridge vs ivy bridge) use different cache architecture. Hence I am wondering how GCC does effective reordering that helps in the execution performance irrespective of the cache architecture. Some example C code and reordered GCC generated code would be very useful. Thanks!
Compiler and hardware try to reorder programs in order to improve their efficiency, while respecting dependencies. Indeed their actions is complementary. Compiler can consider larger reorganisations than processor and uses more complex heuristics to do it.
Thinking in terms of instruction reordering lets us assume that once the instruction is finally executed, all cores will immediately see that new value in memory. It also lets us understand the CPU from the point of the user program rather than trying to understand the internals.
If you mean that the difference can not be observed, then yes, the compiler (and even the CPU itself) is free to reorder the operations.
__sync_synchronize This function synchronizes data in all threads. A full memory barrier is created when this function is invoked.
The reordering that GCC may do is unrelated to the reordering an (x86) CPU may do.
Let's start off with compiler reordering. The C language rules are such that GCC is forbidden from reordering volatile
loads and store memory accesses with respect to each other, or deleting them, when a sequence point occurs between them (Thanks to bobc for this clarification). That is to say, in the assembly output, those memory accesses will appear, and will be sequenced precisely in the order you specified. Non-volatile
accesses, on the other hand, can be reordered with respect to all other accesses, volatile
or not, provided that (by the as-if rule) the end result of the calculation is the same.
For instance, a non-volatile
load in the C code could be done as many times as the code says, but in a different order (e.g. If the compiler feels it's more convenient to do it earlier or later when more registers are available). It could be done fewer times than the code says (e.g. If a copy of the value happened to still be available in a register in the middle of a large expression). Or it could even be deleted (e.g. if the compiler can prove the uselessness of the load, or if it moved a variable entirely into a register).
To prevent compiler reorderings at other times, you must use a compiler-specific barrier. GCC uses __asm__ __volatile__("":::"memory");
for this purpose.
This is different from CPU reordering, a.k.a. the memory-ordering model. Ancient CPUs executed instructions precisely in the order they appeared in the program; This is called program ordering, or the strong memory-ordering model. Modern CPUs, however, sometimes resort to "cheats" to run faster, by weakening a little the memory model.
The way x86 CPUs weaken the memory model is documented in Intel's Software Developer Manuals, Volume 3, Chapter 8, Section 8.2.2 "Memory Ordering in P6 and More Recent Processor Families". This is, in part, what it reads:
It also gives very good examples of what can and cannot be reordered, in Section 8.2.3 "Examples Illustrating the Memory-Ordering Principles".
As you can see, one uses FENCE instructions to prevent an x86 CPU from reordering memory accesses inappropriately.
Lastly, you may be interested in this link, which goes into further detail and comes with the assembly examples you crave.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With