I'm worried about the correctness of the seemingly-standard pre-C#6 pattern for firing an event:
EventHandler localCopy = SomeEvent;
if (localCopy != null)
localCopy(this, args);
I've read Eric Lippert's Events and races and know that there is a remaining issue of calling a stale event handler, but my worry is whether the compiler/JITter is allowed to optimize away the local copy, effectively rewriting the code as
if (SomeEvent != null)
SomeEvent(this, args);
with possible NullReferenceException
.
According to the C# Language Specification, §3.10,
The critical execution points at which the order of these side effects must be preserved are references to volatile fields (§10.5.3), lock statements (§8.12), and thread creation and termination.
— so there are no critical execution points are in the mentioned pattern, and the optimizer is not constrained by that.
The related answer by Jon Skeet (year 2009) states
The JIT isn't allowed to perform the optimization you're talking about in the first part, because of the condition. I know this was raised as a spectre a while ago, but it's not valid. (I checked it with either Joe Duffy or Vance Morrison a while ago; I can't remember which.)
— but comments refer to this blog post (year 2008): Events and Threads (Part 4), which basically says that CLR 2.0's JITter (and probably subsequent versions?) must not introduce reads or writes, so there must be no problem under Microsoft .NET. But this seems to say nothing about other .NET implementations.
[Side note: I don't see how non-introducing of reads proves the correctness of the said pattern. Couldn't JITter just see some stale value of SomeEvent
in some other local variable and optimize out one of the reads, but not the other? Perfectly legitimate, right?]
Moreover, this MSDN article (year 2012): The C# Memory Model in Theory and Practice by Igor Ostrovsky states the following:
Non-Reordering Optimizations Some compiler optimizations may introduce or eliminate certain memory operations. For example, the compiler might replace repeated reads of a field with a single read. Similarly, if code reads a field and stores the value in a local variable and then repeatedly reads the variable, the compiler could choose to repeatedly read the field instead.
Because the ECMA C# spec doesn’t rule out the non-reordering optimizations, they’re presumably allowed. In fact, as I’ll discuss in Part 2, the JIT compiler does perform these types of optimizations.
This seems to contradict the Jon Skeet's answer.
As now C# is not a Windows-only language any more, the question arises whether the validity of the pattern is a consequence of limited JITter optimizations in the current CLR implementation, or it is expected property of the language.
So, the question is following: is the pattern being discussed valid from the point of view of C#-the-language? (That implies whether a language compiler/runtime is required to prohibit certain kind of optimizations.)
Of course, normative references on the topic are welcome.
According to the sources you provided and a few others in the past, it breaks down to this:
With the Microsoft implementation, you can rely on not having read introduction [1] [2] [3]
For any other implementation, it may have read introduction unless it states otherwise
EDIT: Having re-read the ECMA CLI specification carefully, read introductions are possible, but constrained. From Partition I, 12.6.4 Optimization:
Conforming implementations of the CLI are free to execute programs using any technology that guarantees, within a single thread of execution, that side-effects and exceptions generated by a thread are visible in the order specified by the CIL. For this purpose only volatile operations (including volatile reads) constitute visible side-effects. (Note that while only volatile operations constitute visible side-effects, volatile operations also affect the visibility of non-volatile references.)
A very important part of this paragraph is in parentheses:
Note that while only volatile operations constitute visible side-effects, volatile operations also affect the visibility of non-volatile references.
So, if the generated CIL reads a field only once, the implementation must behave the same. If it introduces reads, it's because it can prove that the subsequent reads will yield the same result, even facing side effects from other threads. If it cannot prove that and it still introduces reads, it's a bug.
In the same manner, C# the language also constrains read introduction at the C#-to-CIL level. From the C# Language Specification Version 5.0, 3.10 Execution Order:
Execution of a C# program proceeds such that the side effects of each executing thread are preserved at critical execution points. A side effect is defined as a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception. The critical execution points at which the order of these side effects must be preserved are references to volatile fields (§10.5.3),
lock
statements (§8.12), and thread creation and termination. The execution environment is free to change the order of execution of a C# program, subject to the following constraints:
Data dependence is preserved within a thread of execution. That is, the value of each variable is computed as if all statements in the thread were executed in original program order.
Initialization ordering rules are preserved (§10.5.4 and §10.5.5).
The ordering of side effects is preserved with respect to volatile reads and writes (§10.5.3). Additionally, the execution environment need not evaluate part of an expression if it can deduce that that expression’s value is not used and that no needed side effects are produced (including any caused by calling a method or accessing a volatile field). When program execution is interrupted by an asynchronous event (such as an exception thrown by another thread), it is not guaranteed that the observable side effects are visible in the original program order.
The point about data dependence is the one I want to emphasize:
Data dependence is preserved within a thread of execution. That is, the value of each variable is computed as if all statements in the thread were executed in original program order.
As such, looking at your example (similar to the one given by Igor Ostrovsky [4]):
EventHandler localCopy = SomeEvent;
if (localCopy != null)
localCopy(this, args);
The C# compiler should not perform read introduction, ever. Even if it can prove that there are no interfering accesses, there's no guarantee from the underlying CLI that two sequential non-volatile reads on SomeEvent
will have the same result.
Or, using the equivalent null conditional operator since C# 6.0:
SomeEvent?.Invoke(this, args);
The C# compiler should always expand to the previous code (guaranteeing a unique non-conflicting variable name) without performing read introduction, as that would leave the race condition.
The JIT compiler should only perform the read introduction if it can prove that there are no interfering accesses, depending on the underlying hardware platform, such that the two sequential non-volatile reads on SomeEvent
will in fact have the same result. This may not be the case if, for instance, the value is not kept in a register and if the cache may be flushed between reads.
Such optimization, if local, can only be performed on plain (non-ref and non-out) parameters and non-captured local variables. With inter-method or whole program optimizations, it can be performed on shared fields, ref or out parameters and captured local variables that can be proven they are never visibly affected by other threads.
So, there's a big difference whether it's you writing the following code or the C# compiler generating the following code, versus the JIT compiler generating machine code equivalent to the following code, as the JIT compiler is the only one capable of proving if the introduced read is consistent with the single thread execution, even facing potential side-effects caused by other threads:
if (SomeEvent != null)
SomeEvent(this, args);
An introduced read that may yield a different result is a bug, even according to the standard, as there's an observable difference were the code executed in program order without the introduced read.
As such, if the comment in Igor Ostrovsky's example [4] is true, I say it's a bug.
[1]: A comment by Eric Lippert; quoting:
To address your point about the ECMA CLI spec and the C# spec: the stronger memory model promises made by CLR 2.0 are promises made by Microsoft. A third party that decided to make their own implementation of C# that generates code that runs on their own implementation of CLI could choose a weaker memory model and still be compliant with the specifications. Whether the Mono team has done so, I do not know; you'll have to ask them.
[2]: CLR 2.0 memory model by Joe Duffy, reiterating the next link; quoting the relevant part:
- Rule 1: Data dependence among loads and stores is never violated.
- Rule 2: All stores have release semantics, i.e. no load or store may move after one.
- Rule 3: All volatile loads are acquire, i.e. no load or store may move before one.
- Rule 4: No loads and stores may ever cross a full-barrier (e.g. Thread.MemoryBarrier, lock acquire, Interlocked.Exchange, Interlocked.CompareExchange, etc.).
- Rule 5: Loads and stores to the heap may never be introduced.
- Rule 6: Loads and stores may only be deleted when coalescing adjacent loads and stores from/to the same location.
[3]: Understand the Impact of Low-Lock Techniques in Multithreaded Apps by Vance Morrison, the latest snapshot I could get on the Internet Archive; quoting the relevant portion:
Strong Model 2: .NET Framework 2.0
(...)
- All the rules that are contained in the ECMA model, in particular the three fundamental memory model rules as well as the ECMA rules for volatile.
- Reads and writes cannot be introduced.
- A read can only be removed if it is adjacent to another read to the same location from the same thread. A write can only be removed if it is adjacent to another write to the same location from the same thread. Rule 5 can be used to make reads or writes adjacent before applying this rule.
- Writes cannot move past other writes from the same thread.
- Reads can only move earlier in time, but never past a write to the same memory location from the same thread.
[4]: C# - The C# Memory Model in Theory and Practice, Part 2 by Igor Ostrovsky, where he shows a read introduction example that, according to him, the JIT may perform such that two consequent reads may have different results; quoting the relevant part:
Read Introduction As I just explained, the compiler sometimes fuses multiple reads into one. The compiler can also split a single read into multiple reads. In the .NET Framework 4.5, read introduction is much less common than read elimination and occurs only in very rare, specific circumstances. However, it does sometimes happen.
To understand read introduction, consider the following example:
public class ReadIntro {
private Object _obj = new Object();
void PrintObj() {
Object obj = _obj;
if (obj != null) {
Console.WriteLine(obj.ToString());
// May throw a NullReferenceException
}
}
void Uninitialize() {
_obj = null;
}
}
If you examine the PrintObj method, it looks like the obj value will never be null in the obj.ToString expression. However, that line of code could in fact throw a NullReferenceException. The CLR JIT might compile the PrintObj method as if it were written like this:
void PrintObj() {
if (_obj != null) {
Console.WriteLine(_obj.ToString());
}
}
Because the read of the _obj field has been split into two reads of the field, the ToString method may now be called on a null target.
Note that you won’t be able to reproduce the NullReferenceException using this code sample in the .NET Framework 4.5 on x86-x64. Read introduction is very difficult to reproduce in the .NET Framework 4.5, but it does nevertheless occur in certain special circumstances.
The optimizer is not allowed to transform the pattern of code stored in a local variable that is later used into having all uses of that variable just be the original expression used to initialize it. That's not a valid transformation to make, so it's not an "optimization". The expression can cause, or be dependent on, side effects, so that expression needs to be run, stored somewhere, and then used when specified. It would be an invalid transformation of the runtime to resolve the event to a delegate twice, when your code only has it done once.
As far as re-ordering is concerned; the re-ordering of operation is quite complicated with respect to multiple threads, but the whole point of this pattern is that you're now doing the relevant logic in a single threaded context. The value of the event is stored into a local, and that read can be ordered more or less arbitrarily with respect to any code running in other threads, but the read of that value into the local variable cannot be re-reordered with respect to subsequent operations of that same thread, namely the if
check or the invocation of that delegate.
Given that, the pattern does indeed do what it intends to do, which is to take a snapshot of the event and invokes all handlers, if there are any, without throwing a NRE due to there not being any handlers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With