I'm a newbie when it comes to this. Could anyone provide a simplified explanation of the differences between the following memory barriers? <ul> <li>The windows <code>MemoryBarrier();</code> </li> <li>The fence <code>_mm_mfence();</code> </li> <li>The inline assembly <code>asm volatile ("" : : : "memory");</code> </li> <li>The intrinsic <code>_ReadWriteBarrier();</code> </li> </ul> If there isn't a simple explanation some links to good articles or books would probably help me get it straight. Until now I was fine with just using objects written by others wrapping these calls but I'd like to have a better understanding than my current thinking which is basically along the lines of there is more than one way to implement memory barriers under the covers.

Both <code>MemoryBarrier</code> (MSVC) and <code>_mm_mfence</code> (supported by several compilers) provide a hardware memory fence, which prevents the processor from moving reads and writes across the fence. The main difference is that MemoryBarrier has platform specific implementations for x86, x64 and IA64, where as _mm_mfence specifically uses the <code>mfence</code> SSE2 instruction, so it's not always available. On x86 and x64 MemoryBarrier is implemented with a <code>xchg</code> and <code>lock or</code> respectively, and I have seen some claims that this is faster than mfence. However my own benchmarks show the opposite, so apparently it's very much dependent on processor model. Another difference is that mfence can also be used for ordering non-temporal stores/loads (<code>movntq</code> etc). GCC also has <code>__sync_synchronize</code> which generates a hardware fence. <code>asm volatile ("" : : : "memory")</code> in GCC and <code>_ReadWriteBarrier</code> in MSVC only provide a compiler level memory fence, preventing the compiler from reordering memory accesses. That means the processor is still free to do reordering. Compiler fences are generally used in combination with operations that have some kind of implicit hardware fence. E.g. on x86/x64 all stores have a release fence and loads have an acquire fence, so you just need a compiler fence when implementing load-acquire and store-release.

C++ Memory Barriers for Atomics

Tags:

c++

windows

visual-c++

memory-barriers

I'm a newbie when it comes to this. Could anyone provide a simplified explanation of the differences between the following memory barriers?

The windows MemoryBarrier();
The fence _mm_mfence();
The inline assembly asm volatile ("" : : : "memory");
The intrinsic _ReadWriteBarrier();

If there isn't a simple explanation some links to good articles or books would probably help me get it straight. Until now I was fine with just using objects written by others wrapping these calls but I'd like to have a better understanding than my current thinking which is basically along the lines of there is more than one way to implement memory barriers under the covers.

981

asked Jan 12 '12 20:01

AJG85

1 Answers

Both MemoryBarrier (MSVC) and _mm_mfence (supported by several compilers) provide a hardware memory fence, which prevents the processor from moving reads and writes across the fence.

The main difference is that MemoryBarrier has platform specific implementations for x86, x64 and IA64, where as _mm_mfence specifically uses the mfence SSE2 instruction, so it's not always available.

On x86 and x64 MemoryBarrier is implemented with a xchg and lock or respectively, and I have seen some claims that this is faster than mfence. However my own benchmarks show the opposite, so apparently it's very much dependent on processor model.

Another difference is that mfence can also be used for ordering non-temporal stores/loads (movntq etc).

GCC also has __sync_synchronize which generates a hardware fence.

asm volatile ("" : : : "memory") in GCC and _ReadWriteBarrier in MSVC only provide a compiler level memory fence, preventing the compiler from reordering memory accesses. That means the processor is still free to do reordering.

Compiler fences are generally used in combination with operations that have some kind of implicit hardware fence. E.g. on x86/x64 all stores have a release fence and loads have an acquire fence, so you just need a compiler fence when implementing load-acquire and store-release.

101

answered Sep 24 '22 10:09

Timo

Related questions
                            
                                GCC: sorry, unimplemented: 64-bit mode not compiled in
                            
                                Complexity of std::list::splice and other list containers
                            
                                Inferring the call signature of a lambda or arbitrary callable for "make_function"
                            
                                Are there any guarantees on the representation of large enum values?
                            
                                Efficient way to compute geometric mean of many numbers
                            
                                Why don't methods of structs have to be declared in C++?
                            
                                Is it possible to deprecate implicit conversion while allowing explicit conversion?
                            
                                gprof and arguments to executable
                            
                                Variables after the colon in a constructor [duplicate]
                            
                                Passing operator as a parameter
                            
                                When is overloading pass by reference (l-value and r-value) preferred to pass-by-value?
                            
                                Does clearing a vector affect its capacity?
                            
                                C++ Structs with Member Functions vs. Classes with Public Variables
                            
                                Inconsistency for size_t and sizeof
                            
                                How to generate nested loops at compile time
                            
                                Automatically generate C++ file from header?
                            
                                What is WINVER?
                            
                                __attribute__((format(printf, 1, 2))) for MSVC?
                            
                                Declaring the array size with a non-constant variable
                            
                                Passing/Moving parameters of a constructor in C++0x

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With