Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will the C# compiler optimize this code?

I run into this scenario frequently. At first glance, I think, “That’s poor coding; I’m executing a method twice and necessarily getting the same result.” But upon thinking that, I have to wonder if the compiler is as smart as I am and can come to the same conclusion.

var newList = oldList.Select(x => new Thing {
    FullName = String.Format("{0} {1}", x.FirstName, x.LastName),
    OtherThingId = x.GetOtherThing() != null : x.GetOtherThing().Id : 0 // Might call x.GetOtherThing() twice?
});

Does the behavior of the compiler depend on the contents of the GetOtherThing method? Say it looks like this (somewhat similar to my real code right now):

public OtherThing GetOtherThing() {
    if (this.Category == null) return null;
    return this.Category.OtherThings.FirstOrDefault(t => t.Text == this.Text);
}

That will, barring very poorly handled asynchronous changes to whatever store these objects are coming from, definitely return the same thing if run twice in a row. But what if it looked like this (nonsensical example for the sake of argument):

public OtherThing GetOtherThing() {
    return new OtherThing {
        Id = new Random().Next(100)
    };
}

Running that twice in a row would result in the creation of two different objects, with different Ids in all likelihood. What would the compiler do in these situations? Is it as inefficient as it seems to do what I showed in my first listing?

Doing some of the work myself

I ran something very similar to that first code listing and put a breakpoint in the GetOtherThing instance method. The breakpoint was hit once. So, it looks like the result is indeed cached. What would happen in the second case, where the method might return something different each time? Would the compiler optimize incorrectly? Are there any caveats to the result that I found?

EDIT

That conclusion was invalid. See comments under @usr’s answer.

like image 969
Andrew Avatar asked Feb 18 '14 20:02

Andrew


2 Answers

There are two compilers to consider here: the C# compiler that turns C# into IL, and the IL compiler that turns IL into machine code -- called the jitter, because it happens Just In Time.

The Microsoft C# compiler certainly does no such optimization. Method calls are generated as method calls, end of story.

The jitter is permitted to perform the optimization you describe provided that doing so cannot be detected. For example, suppose you had:

y = M() != 0 ? M() : N()

and

static int M() { return 1; }

The jitter is permitted to turn this program into:

y = 1 != 0 ? 1 : N()

or for that matter

y = 1;

Whether the jitter does so or not is an implementation detail; you'll have to ask an expert on the jitter whether it actually does perform this optimization if you care.

Similarly, if you had

static int m;
static int M() { return m; }

then the jitter could optimize that into

y = m != 0 ? m : N()

or even into:

int q = m;
y = q != 0 ? q : N();

because the jitter is permitted to turn two field reads in a row with no intervening write into a single field read, provided that the field is not volatile. Again whether it does so or not is an implementation detail; ask a jitter developer.

However, in your latter example the jitter cannot elide the second call because it has a side effect.

I ran something very similar to that first code listing and put a breakpoint in the GetOtherThing instance method. The breakpoint was hit once.

That is highly improbable. Almost all optimizations are turned off when you are debugging, precisely so that it is easier to debug. As Sherlock Holmes never said, when you eliminate the improbable, the most likely explanation is that the original poster was mistaken.

like image 197
Eric Lippert Avatar answered Sep 22 '22 10:09

Eric Lippert


The compiler can only apply optimizations if you cannot tell the difference. In your "random" example you can clearly tell the difference. It cannot be "optimized" this way. It would violate the C# spec. In fact the spec does not talk about optimizations a lot. It just says what you should observe the program do. In this case, it specifies that two random numbers should be drawn.

In the first example, it might be possible to apply this optimization. It will never occur in practice. Here are some things that make it hard:

  • The data the query operates on could be changed by a virtual function call of yours, or your lambda (t => t.Text == this.Text) could change the list. Very insidious.
  • It could be changed by another thread. I'm not sure what the .NET memory model says about this.
  • It could be changed by reflection.
  • It must be provable that the computation will always return the same value. How would you prove that? You would need to analyze all code that could possibly run. Including virtual calls and data-dependent control-flow.

All of this has to work across non-inlined methods and across assemblies.

The C# compiler cannot do this because it cannot look into mscorlib. A patch release might change mscorlib at any time.

The JIT is a poor JIT (alas) and it is optimized for speed of compilation (alas). It does not do this. If you are in doubt whether the current JIT will do some advanced optimization or not, it is a safe bet that it won't.

like image 32
usr Avatar answered Sep 22 '22 10:09

usr