How does the lock statement ensure intra processor synchronization?

Question

I have a small test application that executes two threads simultaneously. One increments a static long _value, the other one decrements it. I've ensured with ProcessThread.ProcessorAffinity that the threads are associated with different physical (no HT) cores to force intra processor communication and I have ensured that they overlap in execution time for a significant amount of time.

Of course, the following does not lead to zero:

for (long i = 0; i < 10000000; i++)
{
    _value += offset;
}

So, the logical conclusion would be to:

for (long i = 0; i < 10000000; i++)
{
    Interlocked.Add(ref _value, offset);
}

Which of course leads to zero.

However, the following also leads to zero:

for (long i = 0; i < 10000000; i++)
{
    lock (_syncRoot)
    {
        _value += offset;
    }
}

Of course, the lock statement ensures that the reads and writes are not reordered because it employs a full fence. However, I cannot find any information concerning synchronization of processor caches. If there wouldn't be any cache synchronization, I'd think I should be seeing deviation from 0 after both threads were finished?

Can someone explain to me how lock/Monitor.Enter/Exit ensures that processor caches (L1/L2 caches) are synchronized?

oxilumin · Accepted Answer

Cache coherence in this case does not depend on lock. If you use lock statement it ensures that your assembler commands are not mixed. a += b is not an atomic to processor, it looks like:

Load data into register from memory
Increment data
Store data back

And without lock it may be:

Load data into register X from memory
Load data into register Y from memory
Increment data (in X)
Decrement data (in Y)
Store data back (from X)
Store data back (from Y) // In this case increment is lost.

But it's not about cache coherence, it's a more high-level feature.

So, lock does not ensures that the caches are synchronized. Cache synchronization is a processor internal feature which does not depend on code. You can read about it here.

When one core writes a value to memory and then when the second core try to read that value it won't have the actual copy in its cache unless its cache entry is invalidated so a cache miss occurs. And this cache miss forces cache entry to be updated to actual value.

Craig Stuntz · Answer

The CLR memory model guarantees (requires) that loads/stores can't cross a fence. It's up to the CLR implementers to enforce this on real hardware, which they do. However, this is based on the advertised / understood behavior of the hardware, which can be wrong.

Jake T. · Answer

The lock keyword is just syntactic sugar for a pair of System.Threading.Monitor.Enter() and System.Threading.Monitor.Exit() calls. The implementations of Monitor.Enter() and Monitor.Exit() put up a memory fence which entails performing architecture appropriate cache flushing. So your other thread won't proceed until it can see the stores that results from the execution of the locked section.

How does the lock statement ensure intra processor synchronization?

Tags:

c#

.net

synchronization

multithreading

Pieter van Ginkel

3 Answers

oxilumin

Craig Stuntz

Jake T.

Recent Activity

Donate For Us

How does the lock statement ensure intra processor synchronization?

Tags:

c#

.net

synchronization

multithreading

Pieter van Ginkel

3 Answers

oxilumin

Craig Stuntz

Jake T.

Related questions

Recent Activity

Donate For Us