I thought that until recently. Volatile reads aren't what you think they are - they're not about guaranteeing that they get the most recent value; they're about making sure that no read which is later in the program code is moved to before this read. That's what the spec guarantees - and likewise for volatile writes, it guarantees that no earlier write is moved to after the volatile one.
You're not alone in suspecting this code, but Joe Duffy explains it better than I can :)
My answer to this is to give up on lock-free coding other than by using things like PFX which are designed to insulate me from it. The memory model is just too hard for me - I'll leave it to the experts, and stick with things that I know are safe.
One day I'll update my threading article to reflect this, but I think I need to be able to discuss it more sensibly first...
(I don't know about the no-inlining part, btw. I suspect that inlining could introduce some other optimizations which aren't meant to happen around volatile reads/writes, but I could easily be wrong...)
Maybe I am oversimplifying, but I think the explanations about reordering and cache coherency and so on give too much details.
So, why the MemoryBarrier comes after the actual read? I will try to explain this with an example that uses object instead of int.
One may think the correct is: Thread 1 creates the object (initializes its inner data). Thread 1 then puts the object into a variable. Then it "does a fence" and all threads see the new value.
Then, the read is something like this: Thread 2 "does a fence". Thread 2 reads the object instance. Thread 2 is sure that it has all the inner data of that instance (as it started with a fence).
The biggest problem with this is: Thread 1 creates the object and initializes it. Thread 1 then puts the object into a variable. Before the Thread flushes the cache, the CPU itself flushes part of the cache... it commits only the address of the variable (not the contents of that variable).
At that moment, Thread 2 had already flushed its cache. So it is going to read everything from the main memory. So, it reads the variable (it is there). Then it reads the content (it is not there).
Finally, after all this, the CPU 1 executes the Thread 1 that does the fence.
So, what happens with the volatile write and read? The volatile write makes the contents of the object go to the memory immediately (starts by the fence), then they set the variable (with may not go immediatelly to the real memory). Then, the volatile read will first clear the cache. Then it reads the field. If it receives a value when reading the field, it is certain that the contents pointed by that reference are really there.
By those little things, yes, it is possible that you do a VolatileWrite(1) and another thread still see the value of zero. But as soon other threads see the value of 1 (using a volatile read), all other items needed that may be referenced are already there. You can't really tell it as when reading the old value (0 or null) you may simple not progress considering that you don't still have everything that you need.
I already saw some discussions that, even if that flushes the caches twice, the right pattern will be:
MemoryBarrier - will flush other variables changed before this call
Write
MemoryBarrier - will guarantee that the write was flushed
The Read will then need the same:
MemoryBarrier
Read - Guarantees that we see the latest info... maybe one that was put AFTER our memory barrier.
As something may have appeared after our MemoryBarrier and was already read, we must put another MemoryBarrier to access the contents.
Those could be two Write-Fences or two Read-Fences if that existed in .Net.
I am not sure on everything I said... that is a "compilation" of many information I got and it really explains why the VolatileRead and VolatileWrite appear to be reversed, but it also guarantees that no invalid values are read when using them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With