As far as I have understood, <code>mfence</code> is a hardware memory barrier while <code>asm volatile ("" : : : "memory")</code> is a compiler barrier. But,can <code>asm volatile ("" : : : "memory")</code> be used in place of mfence. The reason I have got confused is this link

<ul> <li> <code>asm volatile ("" ::: "memory")</code> is just a compiler barrier. </li> <li> <code>asm volatile ("mfence" ::: "memory")</code> is both a compiler barrier and <code>MFENCE</code> </li> <li> <code>__sync_synchronize()</code> is also a compiler barrier and a full memory barrier. </li> </ul> so <code>asm volatile ("" ::: "memory")</code> will not prevent CPU reordering independent data instructions per se. As pointed out x86-64 has a strong memory model, but StoreLoad reordering is still possible. If a full memory barrier is needed for your algorithm to work then you neeed <code>__sync_synchronize</code>

difference in mfence and asm volatile ("" : : : "memory")

Tags:

x86

gcc

memory-barriers

As far as I have understood, mfence is a hardware memory barrier while asm volatile ("" : : : "memory") is a compiler barrier. But,can asm volatile ("" : : : "memory") be used in place of mfence.

The reason I have got confused is this link

989

asked Aug 29 '12 17:08

Neal

2 Answers

Well, a memory barrier is only needed on architectures that have weak memory ordering. x86 and x64 don't have weak memory ordering. on x86/x64 all stores have a release fence and all loads have an acquire fence. so, you should only really need asm volatile ("" : : : "memory")

For a good overview of both Intel and AMD as well as references to the relavent manufacturer specs, see http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/

Generally things like "volatile" are used on a per-field basis where loads and stores to that field are natively atomic. Where loads and stores to a field are already atomic (i.e. the "operation" in question is a load or a store to a single field and thus the entire operation is atomic) the volatile field modifier or memory barriers are not needed on x86/x64. Portable code notwithstanding.

When it comes to "operations" that are not atomic--e.g. loads or stores to a field that is larger than a native word or loads or stores to multiple fields within an "operation"--a means by which the operation can be viewed as atomic are required regardless of CPU architecture. generally this is done by means of a synchronization primitive like a mutex. Mutexes (the ones I've used) include memory barriers to avoid issues like processor reordering so you don't have to add extra memory barrier instructions. I generally consider not using synchronization primitives a premature optimization; but, the nature of premature optimization is, of course, 97% of the time :)

Where you don't use a synchronization primitive and you're dealing with a multi-field invariant, memory barriers that ensure the processor does not reorder stores and loads to different memory locations is important.

Now, in terms of not issuing an "mfence" instruction in asm volatile but using "memory" in the clobber list. From what I've been able to read

If your assembler instructions access memory in an unpredictable fashion, add `memory' to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.

When they say "GCC" and don't mention anything about the CPU, this means it applies to only the compiler. The lack of "mfence" means there is no CPU memory barrier. You can verify this by disassembling the resulting binary. If no "mfence" instruction is issued (depending on the target platform) then it's clear the CPU is not being told to issue a memory fence.

Depending on the platform you're on and what you're trying to do, there maybe something "better" or more clear... portability not withstanding.

165

answered Oct 21 '22 04:10

Peter Ritchie

asm volatile ("" ::: "memory") is just a compiler barrier.
asm volatile ("mfence" ::: "memory") is both a compiler barrier and MFENCE
__sync_synchronize() is also a compiler barrier and a full memory barrier.

so asm volatile ("" ::: "memory") will not prevent CPU reordering independent data instructions per se. As pointed out x86-64 has a strong memory model, but StoreLoad reordering is still possible. If a full memory barrier is needed for your algorithm to work then you neeed __sync_synchronize

answered Oct 21 '22 04:10

RubenLaguna

Related questions
                            
                                Why this macro is defined as ({ 1; })?
                            
                                Different compiler behavior when applying a const qualifier to a template argument
                            
                                What's the difference in GCC between -std=gnu++0x and -std=c++0x and which one should be used?
                            
                                GCC allows arrays to be returned from function - bug or feature?
                            
                                Clang vs GCC vs MSVC template conversion operator - which compiler is right?
                            
                                Are llvm-gcc and clang binary compatible with gcc? - particularly mingw gcc on Windows
                            
                                C++ determine if compiling with debug symbols without defining a preprocessor symbol
                            
                                Generic lambda with std::function does not capture variables
                            
                                Can you mix c++ compiled with different versions of the same compiler
                            
                                What is the difference between "#pragma pack" and "__attribute__((aligned))"
                            
                                Assigning 128 bit integer in C
                            
                                How do I run the preprocessor on local headers only?
                            
                                gcc atomic built-in functions
                            
                                What is the difference between the global variables in C and C++?
                            
                                CMake and compiler warnings
                            
                                Restricting symbols in a Linux static library
                            
                                Gcc compilation "cannot compute suffix of object files: cannot compile"
                            
                                Macro to replace C++ operator new
                            
                                MSVC equivalent of __attribute__ ((warn_unused_result))?
                            
                                How can I determine if the operating system is POSIX in C?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With