Simplified question:
Is there a difference in timing of memory caches coherency (or "flushing") caused by Interlocked operations compared to Memory barriers? Let's consider in C# - any Interlocked operations vs Thread.MemoryBarrier(). I believe there is a difference.
Background:
I read quite few information about memory barriers - all the impact on prevention of specific types of memory interaction instructions reordering, but I couldn't find consistent info on whether they should cause immediate flushing of read/write queues.
I actually found few sources mentioning that there is NO guarantee on immediacy of the operation (only the prevention of specific reordering is guaranteed). E.g.
Wikipedia: "However, to be clear, it does not mean any operations WILL have completed by the time the barrier completes; only the ORDERING of the completion of operations (when they do complete) is guaranteed"
Freebsd.org (barriers are HW specific, so I guess a specific OS doesn't matter): "memory barriers simply determine relative order of memory operations; they do not make any guarantee about timing of memory operations"
On the other hand Interlocked operations - from their definition - causes immediate flushing of all memory buffers to guarantee the most recent value of variable was updated causes memory subsystem to lock the entire cache line with the value, to prevent access (including reads) from any other CPU/core, until the operation is done.
Am I correct or am I mistaken?
Disclaimer:
This is an evolution of my original question here Variable freshness guarantee in .NET (volatile vs. volatile read)
EDIT1: Fixed my statement about Interlocked operations - inline the text.
EDIT2: Completely remove demonstration code + it's discussion (as some complained about too much information)
To understand C# interlocked operations, you need to understand Win32 interlocked operations.
The "pure" interlocked operations themselves only affect the freshness of the data directly referenced by the operation.
But in Win32, interlocked operations used to imply full memory barrier. I believe this is mostly to avoid breaking old programs on newer hardware. So InterlockedAdd does two things: interlocked add (very cheap, does not affect caches) and full memory barrier (rather heavy op).
Later, Microsoft realized this is expensive, and added versions of each operation that does no or partial memory barrier.
So there are now (in Win32 world) four versions of almost everything: e.g. InterlockedAdd (full fence), InterlockedAddAcquire (read fence), InterlockedAddRelease (write fence), pure InterlockedAddNoFence (no fence).
In C# world, there is only one version, and it matches the "classic" InterlockedAdd - that also does the full memory fence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With