Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can the C# compiler or JIT optimize away a method call in a lambda expression?

I'm starting this question after a discussion which started (in comments) on another StackOverflow question, and I'm intrigued to know the answer. Considering the following expression:

var objects = RequestObjects.Where(r => r.RequestDate > ListOfDates.Max());

Will there be any (performance) advantage of moving the evaluation of ListOfDates.Max() out of the Where clause in this case, or will 1. the compiler or 2. JIT optimize this away?

I believe C# will only do constant folding at compile time, and it could be argued that ListOfDates.Max() can not be known at compile time unless ListOfDates itself is somehow constant.

Perhaps there is another compiler (or JIT) optimization that makes sure that this is only evaluated once?

like image 911
ChristopheD Avatar asked Apr 06 '16 23:04

ChristopheD


1 Answers

Well, it's a bit of a complex answer.

There are two things involved here. (1) the compiler and (2) the JIT.

The compiler

Simply put, the compiler just translates your C# code to IL code. It's a pretty trivial translation for most cases and one of the core ideas of .NET is that each function is compiled as an autonomous block of IL code.

So, don't expect too much from the C# -> IL compiler.

The JIT

That's... a bit more complicated.

The JIT compiler basically translates your IL code to assembler. The JIT compiler also contains an SSA based optimizer. However, there's a time limit, because we don't want to wait too long before our code starts to run. Basically this means that the JIT compiler doesn't do all the super cool stuff that will make your code go extremely fast, simply because that would cost too much time.

We can of course just put it to the test :) Ensure VS will optimize when you run (options -> debugger -> uncheck suppress [...] and just my code), compile in x64 release mode, put a breakpoint and see what happens when you switch to assembler view.

But hey, what's the fun in only having theory; let's put it to the test. :)

static bool Foo(Func<int, int, int> foo, int a, int b)
{
    return foo(a, b) > 0;  // put breakpoint on this line.
}

public static void Test()
{
    int n = 2;
    int m = 2;
    if (Foo((a, b) => a + b, n, m)) 
    {
        Console.WriteLine("yeah");
    }
}

First thing you should notice is that the breakpoint is hit. This already tells that the method ain't inlined; if it were, you wouldn't hit the breakpoint at all.

Next, if you watch the assembler output, you'll notice a 'call' instructions using an address. Here's your function. On closer inspection, you'll notice that it's calling the delegate.

Now, basically this means that the call is not inlined, and therefore is not optimized to match the local (method) context. In other words, not using delegates and putting stuff in your method is probably faster than using delegates.

On the other hand, the call is pretty efficient. Basically the function pointer is simply passed and called. There's no vtable lookup, just a simple call. This means it probably beats calling a member (e.g. IL callvirt). Still, static calls (IL call) should be even faster, since these are predictable compile-time. Again, let's test, shall we?

public static void Test()
{
    ISummer summer = new Summer();
    Stopwatch sw = Stopwatch.StartNew();
    int n = 0;
    for (int i = 0; i < 1000000000; ++i)
    {
        n = summer.Sum(n, i);
    }
    Console.WriteLine("Vtable call took {0} ms, result = {1}", sw.ElapsedMilliseconds, n);

    Summer summer2 = new Summer();
    sw = Stopwatch.StartNew();
    n = 0;
    for (int i = 0; i < 1000000000; ++i)
    {
        n = summer.Sum(n, i);
    }
    Console.WriteLine("Non-vtable call took {0} ms, result = {1}", sw.ElapsedMilliseconds, n);

    Func<int, int, int> sumdel = (a, b) => a + b;
    sw = Stopwatch.StartNew();
    n = 0;
    for (int i = 0; i < 1000000000; ++i)
    {
        n = sumdel(n, i);
    }
    Console.WriteLine("Delegate call took {0} ms, result = {1}", sw.ElapsedMilliseconds, n);

    sw = Stopwatch.StartNew();
    n = 0;
    for (int i = 0; i < 1000000000; ++i)
    {
        n = Sum(n, i);
    }
    Console.WriteLine("Static call took {0} ms, result = {1}", sw.ElapsedMilliseconds, n);
}

Results:

Vtable call took 2714 ms, result = -1243309312
Non-vtable call took 2558 ms, result = -1243309312
Delegate call took 1904 ms, result = -1243309312
Static call took 324 ms, result = -1243309312

The thing here that's interesting is actually the latest test result. Remember that static calls (IL call) are completely deterministic. That means it's a relatively simple thing to optimize for the compiler. If you inspect the assembler output, you'll find that the call to Sum is actually inlined. This makes sense. Actually, if you would test it, just putting the code in the method is just as fast as the static call.

A small remark about Equals

If you measure performance of hash tables, something seems fishy with my explanation. It appears as-if IEquatable<T> makes things go faster.

Well, that's actually true. :-) Hash containers use IEquatable<T> to call Equals. Now, as we all know, objects all implement Equals(object o). So, the containers can either call Equals(object) or Equals(T). The performance of the call itself is the same.

However, if you also implement IEquatable<T>, the implementation usually looks like this:

bool Equals(object o)
{
    var obj = o as MyType;
    return obj != null && this.Equals(obj);
}

Furthermore, if MyType is a struct, the runtime also needs to apply boxing and unboxing. If it would just call IEquatable<T>, none of these steps would be necessary. So, even though it appears slower, this has nothing to do with the call itself.

Your questions

Will there be any (performance) advantage of moving the evaluation of ListOfDates.Max() out of the Where clause in this case, or will 1. the compiler or 2. JIT optimize this away?

Yes, there will be an advantage. The compiler / JIT won't optimize it away.

I believe C# will only do constant folding at compile time, and it could be argued that ListOfDates.Max() can not be known at compile time unless ListOfDates itself is somehow constant.

Actually, if you change the static call to n = 2 + Sum(n, 2) you'll notice that the assembler output will contain a 4. Which proves that the JIT optimizer does do constant folding. (Which is quite obvious actually if you know about how SSA optimizers work... const folding and simplification are called a few times).

The function pointer itself isn't optimized. It might be in the future though.

Perhaps there is another compiler (or JIT) optimization that makes sure that this is only evaluated once?

As for 'another compiler', if you're willing to add 'another language', you can use C++. In C++ these kinds of calls are sometimes optimized away.

More interestingly, Clang is based on LLVM, and there are a few C# compilers for LLVM as well. I believe Mono has an option to optimize to LLVM, and CoreCLR was working on LLILC. While I haven't tested this, LLVM can definitely do these kinds of optimizations.

like image 141
atlaste Avatar answered Sep 29 '22 08:09

atlaste