As far as I have understood, mfence
is a hardware memory barrier while asm volatile ("" : : : "memory")
is a compiler barrier. But,can asm volatile ("" : : : "memory")
be used in place of mfence.
The reason I have got confused is this link
Volatile is used to force it to read a fresh timestamp. When used alone, like this: __asm__ __volatile__ ("") It will not actually execute anything. You can extend this, though, to get a compile-time memory barrier that won't allow reordering any memory access instructions: __asm__ __volatile__ ("":::"memory")
In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
Complier Barriers. A compiler barrier is a sequence point. At such a point, we want all previous operations to have stored their results to memory, and we want all future operations to not have been started yet. The most common sequence point is a function call.
Memory fence is a type of barrier instruction that causes a CPU or compiler to enforce ordering constraint on memory operations issued before and after the memory fence instruction. This typically means that operations issued prior to the fence are guaranteed to performed before operations issued after the fence.
Well, a memory barrier is only needed on architectures that have weak memory ordering. x86 and x64 don't have weak memory ordering. on x86/x64 all stores have a release fence and all loads have an acquire fence. so, you should only really need asm volatile ("" : : : "memory")
For a good overview of both Intel and AMD as well as references to the relavent manufacturer specs, see http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
Generally things like "volatile" are used on a per-field basis where loads and stores to that field are natively atomic. Where loads and stores to a field are already atomic (i.e. the "operation" in question is a load or a store to a single field and thus the entire operation is atomic) the volatile
field modifier or memory barriers are not needed on x86/x64. Portable code notwithstanding.
When it comes to "operations" that are not atomic--e.g. loads or stores to a field that is larger than a native word or loads or stores to multiple fields within an "operation"--a means by which the operation can be viewed as atomic are required regardless of CPU architecture. generally this is done by means of a synchronization primitive like a mutex. Mutexes (the ones I've used) include memory barriers to avoid issues like processor reordering so you don't have to add extra memory barrier instructions. I generally consider not using synchronization primitives a premature optimization; but, the nature of premature optimization is, of course, 97% of the time :)
Where you don't use a synchronization primitive and you're dealing with a multi-field invariant, memory barriers that ensure the processor does not reorder stores and loads to different memory locations is important.
Now, in terms of not issuing an "mfence" instruction in asm volatile but using "memory" in the clobber list. From what I've been able to read
If your assembler instructions access memory in an unpredictable fashion, add `memory' to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.
When they say "GCC" and don't mention anything about the CPU, this means it applies to only the compiler. The lack of "mfence" means there is no CPU memory barrier. You can verify this by disassembling the resulting binary. If no "mfence" instruction is issued (depending on the target platform) then it's clear the CPU is not being told to issue a memory fence.
Depending on the platform you're on and what you're trying to do, there maybe something "better" or more clear... portability not withstanding.
asm volatile ("" ::: "memory")
is just a compiler barrier. asm volatile ("mfence" ::: "memory")
is both a compiler barrier and MFENCE
__sync_synchronize()
is also a compiler barrier and a full memory barrier. so asm volatile ("" ::: "memory")
will not prevent CPU reordering independent data instructions per se. As pointed out x86-64 has a strong memory model, but StoreLoad reordering is still possible. If a full memory barrier is needed for your algorithm to work then you neeed __sync_synchronize
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With