Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the C# compiler create a new Action instance for every passed delegate?

Consider the following code:

public static void M() {
    A(V);
    A(V);
    A(V);
}

public static void V() {

}

public static void A(Action x) {
    x();   
}

This gets compiled behind-the-scenes as:

public static void M() {
    A(new Action(V));
    A(new Action(V));
    A(new Action(V));
}

However, we can write our own simple performance improvement that reduces unnecessary garbage:

private static readonly Action v = new Action(V);
A(v);
A(v);
A(v);

For this very simple case, is there any reason Roslyn couldn't make a similar optimisation?

If the answer is no, what about when the methods are not static but instance members? And what about when there are closed-over variables captured?

like image 861
Xenoprimate Avatar asked Mar 31 '17 03:03

Xenoprimate


2 Answers

we can write our own simple performance improvement that reduces unnecessary garbage

You have rediscovered a special case of common subexpression elimination -- the optimization of identifying when two or more expressions have exactly the same value, computing the value once, and storing it in a variable to be re-used.

Before continuing, I caution you that all so-called "optimizations" are actually trading off one thing for another. Your proposed optimization trades a small amount of collection pressure generated on each call for instead a small memory leak. The cached value in the static field would become a permanent member of the gen 2 heap. Is that worthwhile? It's a question that you'd want to answer by actually making measurements.

For this very simple case, is there any reason Roslyn couldn't make a similar optimisation?

There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program.

In particular, the optimization causes two delegates that were previously value-equal but not reference-equal to become reference-equal. That's likely acceptable.

In practice, implementing optimizations requires large amounts of effort in designing, implementing, testing and maintaining the code that does the optimization. C# does not implement common subexpression elimination optimizations. This optimization has poor bang-for-buck. Few people write code that would benefit from the optimization, and the optimization is small, and it is easy, as you discovered, to do the optimization "by hand" if you care.

I note that C# does do a similar cache on lambdas. It will not do common subexpression elimination, but it will generate certain lambdas only once and cache the results:

void M() { Action x = () => {}; ... }

is generated as though you wrote:

static Action anon = null;
void M() 
{
  if (anon == null) anon = () => {};
  Action x = anon;
  ...

If the answer is no, what about when the methods are not static but instance members?

There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program.

I note that in this case the optimization would be required to deduce when the instances were the same of course. To not do so would be to fail to maintain the invariant that program behaviour must not change.

Again, in practice, C# does not do common subexpression elimination.

And what about when there are closed-over variables captured?

Captured by what? You were talking about method group conversions to delegates just now, and apparently now we are talking about lambdas converted to delegates.

The C# specification explicitly states that the compiler may choose to do common subexpression elimination on identical lambdas, or not, as it sees fit.

There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program. Since the specification explicitly calls out that this optimization is permitted, it is by definition acceptable.

Again, in practice, C# does not do common subexpression elimination.

Perhaps you are noticing a trend here. The answer to the question "is such and such an optimization permitted?" is almost always "yes, if it does not produce an unacceptable change in the behaviour of the program". But the answer to the question "does C# implement such and such an optimization in practice?" is usually no.

If you want some background on the optimizations the compiler does perform, I described them in 2009.

Roslyn does a better job of these optimizations for the most part. For example, Roslyn does a better job of reifying temporary values and locals as ephemeral rather than durable variables. I completely rewrote the nullable arithmetic optimizer; my eight-part series of articles describing how is here. And there were many more improvements. We never considered doing CSE though.

like image 113
Eric Lippert Avatar answered Nov 15 '22 06:11

Eric Lippert


There are a number of different components and nuances to your question, so I'll attempt to break them down one and a time.

First, the compiler is performing magic through syntactic sugar. As you noted,

public static void M() { A(V); }

is the equivalent of

public static void M() { A(new Action(V)); }

but the compiler is saving you the trouble of having to declare the action instance directly. In either case however, the generated IL needs to perform a series of steps:

IL_000C:  ldnull  
IL_000D:  ldftn       UserQuery.V
IL_0013:  newobj      System.Action..ctor
IL_0018:  call        UserQuery.A
IL_0014:  ldarg.0     
IL_0015:  ldarg.0     
IL_0016:  ldftn       UserQuery.V
IL_001C:  newobj      System.Action..ctor
IL_0021:  call        UserQuery.A
IL_0027:  ldarg.0     
IL_0028:  ldarg.0     
IL_0029:  ldftn       UserQuery.V
IL_002F:  newobj      System.Action..ctor
IL_0034:  call        UserQuery.A

A native pointer is generated for our V method at instruction IL_000D. The previous instruction is simply telling us that the method is static, otherwise we'd see instruction IL_000C: ldarg.0 since our instance method argument would need to be pushed onto the evaluation stack. However, in either case, the new action instance still needs to be generated at instruction IL_0013: newobj since we are passing a method pointer (under the hood), not a method instance. Finally, once we have our pointer and new instance, we can invoke our A method.

However, in your second example, things shake up:

IL_0001:  ldsfld      UserQuery.v
IL_0006:  call        UserQuery.A
IL_000B:  nop         
IL_000C:  ldsfld      UserQuery.v
IL_0011:  call        UserQuery.A
IL_0016:  nop         
IL_0017:  ldsfld      UserQuery.v
IL_001C:  call        UserQuery.A

Instead of generating pointers or creating new objects, we simply push the value of our static field v onto the evaluation stack in our ldsfld instructions. Since we have the value, we don't have to perform any additional operations other than call our A method.

Once again, in our second example an additional instruction is generated for instance method declarations, but it doesn't change how the parameter is generated and passed, which is the fundamental underlying reason why Roslyn wouldn't optimize...the compiler is obligated to generate IL that the runtime understands and expects. Trying to optimize your first case to act like the second case is a fundamentally different set of instructions and it can't be optimized as such.

like image 42
David L Avatar answered Nov 15 '22 06:11

David L