Is there any archs where a memory barrier is implemented even with a cache flush? I read that memory barrier affects only CPU reordering but I read statements related to the memory barriers: ensures all the cpu will see the value..., but for me it means a cache flush/invalidation.
When the amount of unwritten data in the cache reaches a certain level, the controller periodically writes cached data to a drive. This write process is called "flushing." The controller uses two algorithms for flushing cache: demand-based and age-based.
In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
The barrier() in the GCC compiler is a null instruction, in which only a "memory" clobber is used. It is interpreted below: The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands.
A safety fence is intended to absorb some energy caused by hitting vehicles and to realign the vehicles to move parallel to the safety fence. A safety barrier is intended to provide containment instead of energy absorption upon hit by vehicles.
The exact impact of a memory barrier depends on the specific architecture
CPUs employ performance optimizations that can result in out-of-order execution. The reordering of memory operations (loads and stores) normally goes unnoticed within a single thread of execution, but causes unpredictable behaviour in concurrent programs and device drivers unless carefully controlled. The exact nature of an ordering constraint is hardware dependent, and defined by the architecture's memory ordering model. Some architectures provide multiple barriers for enforcing different ordering constraints.
http://en.wikipedia.org/wiki/Memory_barrier
Current Intel architectures ensure automatic cache consistency across all CPU's, without explicit use of memory barrier or a cache flush instructions.
In symmetric multiprocessor (SMP) systems, each processor has a local cache. The memory system must guarantee cache coherence. False sharing occurs when threads on different processors modify variables that reside on the same cache line. This invalidates the cache line and forces an update, which hurts performance.
http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/
On pretty much all modern architectures, caches (like the L1 and L2 caches) are ensured coherent by hardware. There is no need to flush any cache to make memory visible to other CPUs.
One could imagine hypothetically a system that was not cache coherent in hardware, but it wouldn't look anything like the current systems that run operating systems like Windows and Linux.
Memory barriers are needed on these architectures to do three things:
The CPU may pre-fetch a read that's invalidated by a write on another core. This must be prevented. (Though on x86, this is prevented in hardware. The pre-fetch is locked to the L1 cache line, so if another CPU invalidates the cache line, the pre-fetch is invalidated as well.)
The CPU may "post" writes and not put them in its L1 cache yet. These writes must be completed at least to L1 cache.
The CPU may re-order reads and writes on one side of the memory barrier with reads and writes on the other side. Depending on the type of memory barrier, some of these re-orderings must be prohibited. (For example, read x; read y;
doesn't ensure the reads happen in that order. But read x; memory_barrier(); read y;
typically does.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With