I need to understand memory fences in multicore machines. Say I have this code
mov [_x], 1; mov r1, [_y]
mov [_y], 1; mov r2, [_x]
Now the unexpected results without memory fences would be that both r1 and r2 can be 0 after execution. In my opinion, to counter that problem, we should put memory fence in both codes, as putting it to only one would still not solve the problem. Something like as follows...
mov [_x], 1; memory_fence; mov r1, [_y]
mov [_y], 1; memory_fence; mov r2, [_x]
Is my understanding correct or am I still missing something? Assume the architecture is x86. Also, can someone tell me how to put memory fences in a C++ code?
Memory fence is a type of barrier instruction that causes a CPU or compiler to enforce ordering constraint on memory operations issued before and after the memory fence instruction. This typically means that operations issued prior to the fence are guaranteed to performed before operations issued after the fence.
Memory barriers are typically used when implementing low-level machine code that operates on memory shared by multiple devices. Such code includes synchronization primitives and lock-free data structures on multiprocessor systems, and device drivers that communicate with computer hardware.
A general memory barrier gives a guarantee that all the LOAD and STORE operations specified before the barrier will appear to happen before all the LOAD and STORE operations specified after the barrier with respect to the other components of the system.
There is no difference. "Fence" and "barrier" mean the same thing in this context. This is perhaps not the answer I'd click "useful" on although you usually deliver them :-) :-) (old time 7 bit emoji) It's sometimes hard to figure out the diff between certain words in English.
Fences serialize the operation that they fence (loads & stores), that is, no other operation may start till the fence is executed, but the fence will not execute till all preceding operations have completed. quoting intel makes the meaning of this a little more precise (taken from the MFENCE instruction, page 3-628, Vol. 2A, Intel Instruction reference):
This serializing operation guarantees that every load and store instruction that precedes the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows the MFENCE instruction.1
- A load instruction is considered to become globally visible when the value to be loaded into its destination register is determined.
Using fences in C++ is tricky (C++11 may have fence semantics somewhere, maybe someone else has info on that), as it is platform and compiler dependent. For x86 using MSVC or ICC, you can use the _mm_lfence
, _mm_sfence
& _mm_mfence
for load, store and load + store fencing (note that some of these are SSE2 instructions).
Note: this assumes an Intel perspective, that is: one using an x86 (32 or 64 bit) or IA64 processor
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With