Joe Duffy, gives 6 rules that describe the CLR 2.0+ memory model (it's actual implementation, not any ECMA standard) I'm writing down my attempt at figuring this out, mostly as a way of rubber ducking, but if I make a mistake in my logic, at least someone here will be able to catch it before it causes me grief.
I'm attempting to understand these rules.
x = y
y = 0 // Cannot move before the previous line according to Rule 1.
x = y
z = 0
// equates to this sequence of loads and stores before possible re-ordering
load y
store x
load 0
store z
Looking at this, it appears that the load 0 can be moved up to before load y, but the stores may not be re-ordered at all. Therefore, if a thread sees z == 0, then it also will see x == y.
If y was volatile, then load 0 could not move before load y, otherwise it may. Volatile stores don't seem to have any special properties, no stores can be re-ordered with respect to each other (which is a very strong guarantee!)
Full barriers are like a line in the sand which loads and stores can not be moved over.
No idea what rule 5 means.
I guess rule 6 means if you do:
x = y
x = z
Then it is possible for the CLR to delete both the load to y and the first store to x.
x = y
z = y
// equates to this sequence of loads and stores before possible re-ordering
load y
store x
load y
store z
// could be re-ordered like this
load y
load y
store x
store z
// rule 6 applied means this is possible?
load y
store x // but don't pop y from stack (or first duplicate item on top of stack)
store z
What if y was volatile? I don't see anything in the rules that prohibits the above optimization from being carried out. This does not violate double-checked locking, because the lock() between the two identical conditions prevents the loads from being moved into adjacent positions, and according to rule 6, that's the only time they can be eliminated.
So I think I understand all but rule 5, here. Anyone want to enlighten me (or correct me or add something to any of the above?)
Joe Duffy discusses Rule 5 on pp517-18 of Concurrent Programming on Windows:
As an example of when a load might be introduced, consider this code:
MyObject mo = ...;
int f = mo.field;
if (f == 0)
{
// do something
Console.WriteLine(f);
}
If the period of time between the initial read of mo.field into variable f and the subsequent use of f in the Console.WriteLine was long enough, a compiler may decide it would be more efficient to reread mo.field twice. ... Doing this would be a problem if mo is a heap object and threads are writing concurrently to mo.field. The if-block may contain code that assumes the value read into f remained 0, and the introduction of reads could break this assumption. In addition to prohibiting this for volatile variables, the .NET memory model prohibits it for ordinary variables referring to GC heap memory too.
I blogged about one important place where this matters: the standard pattern for raising an event.
EventHandler handler = MyEvent;
if (handler != null)
handler(this, EventArgs.Empty);
In order to prevent problems with removing an event handler on a separate thread, we read the current value of MyEvent
and only invoke the event handlers if that delegate is non-null.
If reads from the heap could be introduced, the compiler/JIT might decide that it could be better to read MyEvent
again, rather than using the local, which would introduce a race condition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With