Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC's reordering of read/write instructions

Linux's synchronization primitives (spinlock, mutex, RCUs) use memory barrier instructions to force the memory access instructions from getting re-ordered. And this reordering can be done either by the CPU itself or by the compiler.

Can someone show some examples of GCC produced code where such reordering is done ? I am interested mainly in x86. The reason I am asking this is to understand how GCC decides what instructions can be reordered. Different x86 mirco architectures (for ex: sandy bridge vs ivy bridge) use different cache architecture. Hence I am wondering how GCC does effective reordering that helps in the execution performance irrespective of the cache architecture. Some example C code and reordered GCC generated code would be very useful. Thanks!

like image 942
Manohar Avatar asked Feb 28 '14 22:02

Manohar


People also ask

Why compiler reorder instructions?

Compiler and hardware try to reorder programs in order to improve their efficiency, while respecting dependencies. Indeed their actions is complementary. Compiler can consider larger reorganisations than processor and uses more complex heuristics to do it.

What is instruction reordering?

Thinking in terms of instruction reordering lets us assume that once the instruction is finally executed, all cores will immediately see that new value in memory. It also lets us understand the CPU from the point of the user program rather than trying to understand the internals.

Can compiler reorder function calls?

If you mean that the difference can not be observed, then yes, the compiler (and even the CPU itself) is free to reorder the operations.

What does __ Sync_synchronize do?

__sync_synchronize This function synchronizes data in all threads. A full memory barrier is created when this function is invoked.


1 Answers

The reordering that GCC may do is unrelated to the reordering an (x86) CPU may do.

Let's start off with compiler reordering. The C language rules are such that GCC is forbidden from reordering volatile loads and store memory accesses with respect to each other, or deleting them, when a sequence point occurs between them (Thanks to bobc for this clarification). That is to say, in the assembly output, those memory accesses will appear, and will be sequenced precisely in the order you specified. Non-volatile accesses, on the other hand, can be reordered with respect to all other accesses, volatile or not, provided that (by the as-if rule) the end result of the calculation is the same.

For instance, a non-volatile load in the C code could be done as many times as the code says, but in a different order (e.g. If the compiler feels it's more convenient to do it earlier or later when more registers are available). It could be done fewer times than the code says (e.g. If a copy of the value happened to still be available in a register in the middle of a large expression). Or it could even be deleted (e.g. if the compiler can prove the uselessness of the load, or if it moved a variable entirely into a register).

To prevent compiler reorderings at other times, you must use a compiler-specific barrier. GCC uses __asm__ __volatile__("":::"memory"); for this purpose.

This is different from CPU reordering, a.k.a. the memory-ordering model. Ancient CPUs executed instructions precisely in the order they appeared in the program; This is called program ordering, or the strong memory-ordering model. Modern CPUs, however, sometimes resort to "cheats" to run faster, by weakening a little the memory model.

The way x86 CPUs weaken the memory model is documented in Intel's Software Developer Manuals, Volume 3, Chapter 8, Section 8.2.2 "Memory Ordering in P6 and More Recent Processor Families". This is, in part, what it reads:

  • Reads are not reordered with other reads.
  • Writes are not reordered with older reads.
  • Writes to memory are not reordered with other writes, with [some] exceptions.
  • Reads may be reordered with older writes to different locations but not with older writes to the same location.
  • Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.
  • Reads cannot pass earlier LFENCE and MFENCE instructions.
  • Writes cannot pass earlier LFENCE, SFENCE, and MFENCE instructions.
  • LFENCE instructions cannot pass earlier reads.
  • SFENCE instructions cannot pass earlier writes.
  • MFENCE instructions cannot pass earlier reads or writes.

It also gives very good examples of what can and cannot be reordered, in Section 8.2.3 "Examples Illustrating the Memory-Ordering Principles".

As you can see, one uses FENCE instructions to prevent an x86 CPU from reordering memory accesses inappropriately.

Lastly, you may be interested in this link, which goes into further detail and comes with the assembly examples you crave.

like image 162
Iwillnotexist Idonotexist Avatar answered Dec 06 '22 20:12

Iwillnotexist Idonotexist