I've been having trouble with understanding the performance characteristics of using Func<...>
throughout my code when using inheritance and generics - which is a combination I find myself using all the time.
Let me start with a minimal test case so we all know what we're talking about, then I'll post the results and then I'm going to explain what I would expect and why...
Minimal test case
public class GenericsTest2 : GenericsTest<int>
{
static void Main(string[] args)
{
GenericsTest2 at = new GenericsTest2();
at.test(at.func);
at.test(at.Check);
at.test(at.func2);
at.test(at.Check2);
at.test((a) => a.Equals(default(int)));
Console.ReadLine();
}
public GenericsTest2()
{
func = func2 = (a) => Check(a);
}
protected Func<int, bool> func2;
public bool Check2(int value)
{
return value.Equals(default(int));
}
public void test(Func<int, bool> func)
{
using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
for (int i = 0; i < 100000000; ++i)
{
func(i);
}
}
}
}
public class GenericsTest<T>
{
public bool Check(T value)
{
return value.Equals(default(T));
}
protected Func<T, bool> func;
}
public class Stopwatch : IDisposable
{
public Stopwatch(Action<TimeSpan> act)
{
this.act = act;
this.start = DateTime.UtcNow;
}
private Action<TimeSpan> act;
private DateTime start;
public void Dispose()
{
act(DateTime.UtcNow.Subtract(start));
}
}
The results
Took 2.50s -> at.test(at.func);
Took 1.97s -> at.test(at.Check);
Took 2.48s -> at.test(at.func2);
Took 0.72s -> at.test(at.Check2);
Took 0.81s -> at.test((a) => a.Equals(default(int)));
What I would expect and why
I would have expect this code to run at exactly the same speed for all 5 methods, to be more precise, even faster than any of this, namely just as fast as:
using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
for (int i = 0; i < 100000000; ++i)
{
bool b = i.Equals(default(int));
}
}
// this takes 0.32s ?!?
I expected it to take 0.32s because I don't see any reason for the JIT compiler not to inline the code in this particular case.
On closer inspection, I don't understand these performance numbers at all:
at.func
is passed to the function and cannot be changed during execution. Why isn't this inlined?at.Check
is apparently faster than at.Check2
, while both cannot be overridden and the IL of at.Check in the case of class GenericsTest2 is as fixed as a rockFunc<int, bool>
to be slower when passing an inline Func
instead of a method that's converted to a Func
Question
I'd really like to understand this... what is going on here that using a generic base class is a whopping 10x slower than inlining the whole lot?
So, basically the question is: why is this happening and how can I fix it?
UPDATE
Based on all the comments so far (thanks!) I did some more digging.
First off, a new set of results when repeating the tests and making the loop 5x larger and executing them 4 times. I've used the Diagnostics stopwatch and added more tests (added description as well).
(Baseline implementation took 2.61s)
--- Run 0 ---
Took 3.00s for (a) => at.Check2(a)
Took 12.04s for Check3<int>
Took 12.51s for (a) => GenericsTest2.Check(a)
Took 13.74s for at.func
Took 16.07s for GenericsTest2.Check
Took 12.99s for at.func2
Took 1.47s for at.Check2
Took 2.31s for (a) => a.Equals(default(int))
--- Run 1 ---
Took 3.18s for (a) => at.Check2(a)
Took 13.29s for Check3<int>
Took 14.10s for (a) => GenericsTest2.Check(a)
Took 13.54s for at.func
Took 13.48s for GenericsTest2.Check
Took 13.89s for at.func2
Took 1.94s for at.Check2
Took 2.61s for (a) => a.Equals(default(int))
--- Run 2 ---
Took 3.18s for (a) => at.Check2(a)
Took 12.91s for Check3<int>
Took 15.20s for (a) => GenericsTest2.Check(a)
Took 12.90s for at.func
Took 13.79s for GenericsTest2.Check
Took 14.52s for at.func2
Took 2.02s for at.Check2
Took 2.67s for (a) => a.Equals(default(int))
--- Run 3 ---
Took 3.17s for (a) => at.Check2(a)
Took 12.69s for Check3<int>
Took 13.58s for (a) => GenericsTest2.Check(a)
Took 14.27s for at.func
Took 12.82s for GenericsTest2.Check
Took 14.03s for at.func2
Took 1.32s for at.Check2
Took 1.70s for (a) => a.Equals(default(int))
I noticed from these results, that the moment you start using generics, it gets much slower. Digging a bit more into the IL I found for the non-generic implementation:
L_0000: ldarga.s 'value'
L_0002: ldc.i4.0
L_0003: call instance bool [mscorlib]System.Int32::Equals(int32)
L_0008: ret
and for all the generic implementations:
L_0000: ldarga.s 'value'
L_0002: ldloca.s CS$0$0000
L_0004: initobj !T
L_000a: ldloc.0
L_000b: box !T
L_0010: constrained. !T
L_0016: callvirt instance bool [mscorlib]System.Object::Equals(object)
L_001b: ret
While most of this can be optimized, I suppose the callvirt
can be a problem here.
In an attempt to make it faster I added the 'T : IEquatable' constraint to the definition of the method. The result is:
L_0011: callvirt instance bool [mscorlib]System.IEquatable`1<!T>::Equals(!0)
While I understand more about the performance now (it probably cannot inline because it creates a vtable lookup), I'm still confused: Why doesn't it simply call T::Equals? After all, I do specify it will be there...
But it seems not to be recommended as good practice in programming these days. It’s easy to find many discussions and articles on “ Composition over Inheritance ” as a precaution for engineers. Some modern programming languages like Go don’t even allow the use of inheritance but only the alternative, composition.
A Func<T> is a function, which (optionally) accepts a value and returns a value. It can take the form of a lambda expression but may also take the form of a method body (e.g. x => { return x.Value > 0; } ).
Inheritance has been one of the most popular characteristics of OOP since it was introduced. But it seems not to be recommended as good practice in programming these days. It’s easy to find many discussions and articles on “ Composition over Inheritance ” as a precaution for engineers.
Inheritance will make a class hierarchy — you can imagine it as a tree of classes. Composition is in contrast to inheritance, it enables the creation of complex types by combining objects (components) of other types, rather than inheriting from a base or parent class.
Run micro benchmarks always 3 times. The first will trigger JIT and rule that out. Check if 2nd and 3rd runs are equal. This gives:
... run ...
Took 0.79s
Took 0.63s
Took 0.74s
Took 0.24s
Took 0.32s
... run ...
Took 0.73s
Took 0.63s
Took 0.73s
Took 0.24s
Took 0.33s
... run ...
Took 0.74s
Took 0.63s
Took 0.74s
Took 0.25s
Took 0.33s
The line
func = func2 = (a) => Check(a);
adds an additional function call. Remove it by
func = func2 = this.Check;
gives:
... 1. run ...
Took 0.64s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
... 2. run ...
Took 0.63s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
... 3. run ...
Took 0.63s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
This shows that the (JIT?) effect between 1. and 2. run disappeared due to removing the function call. First 3 tests are now equal.
In tests 4 and 5, the compiler can inline the function argument to void test(Func<>), while in tests 1 to 3 it would be a long way for the compiler to figure out they are constant. Sometimes there are constraints to the compiler that are not easy to see from our coder's perspective, like .Net and Jit constraints coming from the dynamic nature of .Net programs compared to a binary made from c++. In any way, it is the inlining of the function arg that makes the difference here.
Difference between 4 and 5? Well, test5 looks like the compiler can very easily inline the function as well. Maybe he builds a context for closures and resolves it a bit more complex than needed. Did not dig into MSIL to figure out.
Tests above with .Net 4.5. Here with 3.5, demonstrating that the compiler got better with inlining:
... 1. run ...
Took 1.06s
Took 1.06s
Took 1.06s
Took 0.24s
Took 0.27s
... 2. run ...
Took 1.06s
Took 1.08s
Took 1.06s
Took 0.25s
Took 0.27s
... 3. run ...
Took 1.05s
Took 1.06s
Took 1.05s
Took 0.24s
Took 0.27s
and .Net 4:
... 1. run ...
Took 0.97s
Took 0.97s
Took 0.96s
Took 0.22s
Took 0.30s
... 2. run ...
Took 0.96s
Took 0.96s
Took 0.96s
Took 0.22s
Took 0.30s
... 3. run ...
Took 0.97s
Took 0.96s
Took 0.96s
Took 0.22s
Took 0.30s
now changing GenericTest<> to GenericTest !!
... 1. run ...
Took 0.28s
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.27s
... 2. run ...
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.27s
... 3. run ...
Took 0.25s
Took 0.25s
Took 0.25s
Took 0.24s
Took 0.27s
Well this is a surprise from the C# compiler, similar to what I encountered with sealing classes to avoid virtual function calls. Maybe Eric Lippert has a word on that?
Removing the inheritance to aggregation brings performance back. I learned to never use inheritance, ok very very rarely, and can highly recommend you to avoid it at least in this case. (This is my pragmatic solution to this qustion, no flamewars intended). I use interfaces all the way tough, and they carry no performance penalties.
I'm going to explain what I think is going on here and with all generics. I needed some space to write, so I'm posting this as an answer. Thank you all for commenting and helping figuring this out, I'll make sure to award points here and there.
To get started...
Compiling generics
As we all know, generics are 'template' types where the compiler fills in the type information at run-time. It can make assumptions based on the constraints, but it doesn't change the IL code... (but more about that later).
A method from my question:
public class Foo<T>
{
public void bool Handle(T foo)
{
return foo.Equals(default(T));
}
}
The constraints here are that T
is an Object
, which means the call to Equals
is going to Object.Equals. Since T is implementing Object.Equals, this will look like:
L_0016: callvirt instance bool [mscorlib]System.Object::Equals(object)
We can improve on this by making it explicit that T
implements Equals
by adding the constraint T : IEquatable<T>
. This changes the call to:
L_0011: callvirt instance bool [mscorlib]System.IEquatable`1<!T>::Equals(!0)
However, since T hasn't been filled in yet, apparently the IL doesn't support calling T::Equals(!0)
directly even though it is surely there. The compiler can apparently only assume the constraint has been fulfilled, hence it needs to issue a call to IEquatable
1` that defines the method.
Apparently hints like sealed
don't make a difference, even though they should have.
Conclusion: Because T::Equals(!0)
is not supported, a vtable lookup is required to make it work. Once it has become a callvirt
, it's damn difficult for the JIT compiler to figure out that it should have just used a call
.
What should happen: Basically Microsoft should support T::Equals(!0)
when this method clearly exists. That changes the call to a normal call
in IL, making it much faster.
But it gets worse
So what about calling Foo::Handle?
What surprised me is that the call to Foo<T>::Handle
is also a callvirt
and not a call
. The same behavior can be found for f.ex. List<T>::Add
and so on. My observation was that only calls that use this
will become a normal call
; everything else will compile as a callvirt
.
Conclusion: The behavior is as-if you get a class structure like Foo<int>:Foo<T>:[the rest]
, which doesn't really make sense. Apparently all calls to a generic class from outside that class will compile a vtable lookup.
What should happen: Microsoft should change the callvirt
to a call
if the method is non-virtual. Threre's really no reason at all for the callvirt.
Conclusion
If you use generics from another type, be prepared to get a callvirt
instead of a call
, even if this isn't necessary. The resulting performance is basically what you can expect from such a call...
IMHO this is a real shame. Type safety should help developers and at the same time make your code faster because the compiler can make assumptions about what's going on. My lesson learned from all this is: don't use generics, unless you don't care about the extra vtable lookups (until Microsoft fixed this).
Future work
First off, I'm going to post this on Microsoft Connect. I think this is a serious bug in .NET that drains performance without any good reason. ( https://connect.microsoft.com/VisualStudio/feedback/details/782346/using-generics-will-always-compile-to-callvirt-even-if-this-is-not-necessary )
Results from Microsoft Connect
Yes, we have results, with my express thanks to Mike Danes!
The method call to foo.Equals(default(T))
will compile to Object.Equals(boxed[new !0])
because the only equals that all T's have in common is Object.Equals
. This will cause a boxing operation and a vtable lookup.
If we want the thing to use the correct Equals, we have to give the compiler a hint, namely that the type implement bool Equals(T)
. This can be done by telling the compiler that the type T
implements IEquatable<T>
.
In other words: change the signature of the class as follows:
public class GenericsTest<T> where T:IEquatable<T>
{
public bool Check(T value)
{
return value.Equals(default(T));
}
protected Func<T, bool> func;
}
When you do it like this, the runtime will find the correct Equals
method. Phew...
To solve the puzzle completely, one more element is required: .NET 4.5. The runtime of .NET 4.5 is able to inline this method, thereby making it as fast as it should be again. In .NET 4.0 (that's what I'm currently using), this functionality doesn't appear to be there. The call will still be a callvirt
in IL, but the runtime will solve the puzzle regardless.
If you test this code, it should be just as fast as the fastest test cases. Can someone please confirm this?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With