GCC's reordering of read/write instructions

Tags:

Linux's synchronization primitives (spinlock, mutex, RCUs) use memory barrier instructions to force the memory access instructions from getting re-ordered. And this reordering can be done either by the CPU itself or by the compiler.

Can someone show some examples of GCC produced code where such reordering is done ? I am interested mainly in x86. The reason I am asking this is to understand how GCC decides what instructions can be reordered. Different x86 mirco architectures (for ex: sandy bridge vs ivy bridge) use different cache architecture. Hence I am wondering how GCC does effective reordering that helps in the execution performance irrespective of the cache architecture. Some example C code and reordered GCC generated code would be very useful. Thanks!

942

asked Feb 28 '14 22:02

Manohar

1 Answers

The reordering that GCC may do is unrelated to the reordering an (x86) CPU may do.

Let's start off with compiler reordering. The C language rules are such that GCC is forbidden from reordering volatile loads and store memory accesses with respect to each other, or deleting them, when a sequence point occurs between them (Thanks to bobc for this clarification). That is to say, in the assembly output, those memory accesses will appear, and will be sequenced precisely in the order you specified. Non-volatile accesses, on the other hand, can be reordered with respect to all other accesses, volatile or not, provided that (by the as-if rule) the end result of the calculation is the same.

For instance, a non-volatile load in the C code could be done as many times as the code says, but in a different order (e.g. If the compiler feels it's more convenient to do it earlier or later when more registers are available). It could be done fewer times than the code says (e.g. If a copy of the value happened to still be available in a register in the middle of a large expression). Or it could even be deleted (e.g. if the compiler can prove the uselessness of the load, or if it moved a variable entirely into a register).

To prevent compiler reorderings at other times, you must use a compiler-specific barrier. GCC uses __asm__ __volatile__("":::"memory"); for this purpose.

This is different from CPU reordering, a.k.a. the memory-ordering model. Ancient CPUs executed instructions precisely in the order they appeared in the program; This is called program ordering, or the strong memory-ordering model. Modern CPUs, however, sometimes resort to "cheats" to run faster, by weakening a little the memory model.

The way x86 CPUs weaken the memory model is documented in Intel's Software Developer Manuals, Volume 3, Chapter 8, Section 8.2.2 "Memory Ordering in P6 and More Recent Processor Families". This is, in part, what it reads:

Reads are not reordered with other reads.
Writes are not reordered with older reads.
Writes to memory are not reordered with other writes, with [some] exceptions.
Reads may be reordered with older writes to different locations but not with older writes to the same location.
Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions.
Reads cannot pass earlier LFENCE and MFENCE instructions.
Writes cannot pass earlier LFENCE, SFENCE, and MFENCE instructions.
LFENCE instructions cannot pass earlier reads.
SFENCE instructions cannot pass earlier writes.
MFENCE instructions cannot pass earlier reads or writes.

It also gives very good examples of what can and cannot be reordered, in Section 8.2.3 "Examples Illustrating the Memory-Ordering Principles".

As you can see, one uses FENCE instructions to prevent an x86 CPU from reordering memory accesses inappropriately.

Lastly, you may be interested in this link, which goes into further detail and comes with the assembly examples you crave.

162

answered Dec 06 '22 20:12

Iwillnotexist Idonotexist

Related questions
                            
                                How to efficiently store small byte arrays in Java?
                            
                                Setting -XX:MaxRam
                            
                                Memory consumed by a thread
                            
                                mem::replace in Rust
                            
                                Can defining a lot of constants cause performance or memory problems?
                            
                                Hashset memory overhead
                            
                                C: Why allocate string length in powers of 2?
                            
                                C++: is push_back(new Object()) a memory leak?
                            
                                Get Available Free RAM Memory C#
                            
                                Alternative to R's `memory.size()` in linux?
                            
                                Efficiently merging large data.tables [duplicate]
                            
                                scala mailbox size limit
                            
                                Why is there a stack and a heap?
                            
                                Memory management of JavaScript Array [duplicate]
                            
                                Strict aliasing and memory locations
                            
                                Techniques to Reduce CPU to GPU Data Transfer Latency
                            
                                Where are member functions stored for an object?
                            
                                Tracing memory corruption on a production linux server
                            
                                Huge arrays throws out of memory despite enough memory available
                            
                                How to determine if returned pointer is on the stack or heap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

GCC's reordering of read/write instructions

Tags:

memory

compiler-optimization

gcc

linux-kernel

cpu

Manohar

People also ask

1 Answers

Iwillnotexist Idonotexist

Recent Activity

Donate For Us