Consider the following code:
public static void M() {
A(V);
A(V);
A(V);
}
public static void V() {
}
public static void A(Action x) {
x();
}
This gets compiled behind-the-scenes as:
public static void M() {
A(new Action(V));
A(new Action(V));
A(new Action(V));
}
However, we can write our own simple performance improvement that reduces unnecessary garbage:
private static readonly Action v = new Action(V);
A(v);
A(v);
A(v);
For this very simple case, is there any reason Roslyn couldn't make a similar optimisation?
If the answer is no, what about when the methods are not static but instance members? And what about when there are closed-over variables captured?
we can write our own simple performance improvement that reduces unnecessary garbage
You have rediscovered a special case of common subexpression elimination -- the optimization of identifying when two or more expressions have exactly the same value, computing the value once, and storing it in a variable to be re-used.
Before continuing, I caution you that all so-called "optimizations" are actually trading off one thing for another. Your proposed optimization trades a small amount of collection pressure generated on each call for instead a small memory leak. The cached value in the static field would become a permanent member of the gen 2 heap. Is that worthwhile? It's a question that you'd want to answer by actually making measurements.
For this very simple case, is there any reason Roslyn couldn't make a similar optimisation?
There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program.
In particular, the optimization causes two delegates that were previously value-equal but not reference-equal to become reference-equal. That's likely acceptable.
In practice, implementing optimizations requires large amounts of effort in designing, implementing, testing and maintaining the code that does the optimization. C# does not implement common subexpression elimination optimizations. This optimization has poor bang-for-buck. Few people write code that would benefit from the optimization, and the optimization is small, and it is easy, as you discovered, to do the optimization "by hand" if you care.
I note that C# does do a similar cache on lambdas. It will not do common subexpression elimination, but it will generate certain lambdas only once and cache the results:
void M() { Action x = () => {}; ... }
is generated as though you wrote:
static Action anon = null;
void M()
{
if (anon == null) anon = () => {};
Action x = anon;
...
If the answer is no, what about when the methods are not static but instance members?
There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program.
I note that in this case the optimization would be required to deduce when the instances were the same of course. To not do so would be to fail to maintain the invariant that program behaviour must not change.
Again, in practice, C# does not do common subexpression elimination.
And what about when there are closed-over variables captured?
Captured by what? You were talking about method group conversions to delegates just now, and apparently now we are talking about lambdas converted to delegates.
The C# specification explicitly states that the compiler may choose to do common subexpression elimination on identical lambdas, or not, as it sees fit.
There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program. Since the specification explicitly calls out that this optimization is permitted, it is by definition acceptable.
Again, in practice, C# does not do common subexpression elimination.
Perhaps you are noticing a trend here. The answer to the question "is such and such an optimization permitted?" is almost always "yes, if it does not produce an unacceptable change in the behaviour of the program". But the answer to the question "does C# implement such and such an optimization in practice?" is usually no.
If you want some background on the optimizations the compiler does perform, I described them in 2009.
Roslyn does a better job of these optimizations for the most part. For example, Roslyn does a better job of reifying temporary values and locals as ephemeral rather than durable variables. I completely rewrote the nullable arithmetic optimizer; my eight-part series of articles describing how is here. And there were many more improvements. We never considered doing CSE though.
There are a number of different components and nuances to your question, so I'll attempt to break them down one and a time.
First, the compiler is performing magic through syntactic sugar. As you noted,
public static void M() { A(V); }
is the equivalent of
public static void M() { A(new Action(V)); }
but the compiler is saving you the trouble of having to declare the action instance directly. In either case however, the generated IL needs to perform a series of steps:
IL_000C: ldnull
IL_000D: ldftn UserQuery.V
IL_0013: newobj System.Action..ctor
IL_0018: call UserQuery.A
IL_0014: ldarg.0
IL_0015: ldarg.0
IL_0016: ldftn UserQuery.V
IL_001C: newobj System.Action..ctor
IL_0021: call UserQuery.A
IL_0027: ldarg.0
IL_0028: ldarg.0
IL_0029: ldftn UserQuery.V
IL_002F: newobj System.Action..ctor
IL_0034: call UserQuery.A
A native pointer is generated for our V
method at instruction IL_000D
. The previous instruction is simply telling us that the method is static, otherwise we'd see instruction IL_000C: ldarg.0
since our instance method argument would need to be pushed onto the evaluation stack. However, in either case, the new action instance still needs to be generated at instruction IL_0013: newobj
since we are passing a method pointer (under the hood), not a method instance. Finally, once we have our pointer and new instance, we can invoke our A
method.
However, in your second example, things shake up:
IL_0001: ldsfld UserQuery.v
IL_0006: call UserQuery.A
IL_000B: nop
IL_000C: ldsfld UserQuery.v
IL_0011: call UserQuery.A
IL_0016: nop
IL_0017: ldsfld UserQuery.v
IL_001C: call UserQuery.A
Instead of generating pointers or creating new objects, we simply push the value of our static field v onto the evaluation stack in our ldsfld
instructions. Since we have the value, we don't have to perform any additional operations other than call our A
method.
Once again, in our second example an additional instruction is generated for instance method declarations, but it doesn't change how the parameter is generated and passed, which is the fundamental underlying reason why Roslyn wouldn't optimize...the compiler is obligated to generate IL that the runtime understands and expects. Trying to optimize your first case to act like the second case is a fundamentally different set of instructions and it can't be optimized as such.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With