I am working on some heavy cpu bound problem. I see a big performance improvement when I use the inline
keyword.
I create a dictionary from the standard .net library passing in a custom key Comparer see code and timing results below
https://gist.github.com/4409734
without inline keyword on Eq_cmp
> perf_run 10000000 ;;
Real: 00:00:11.039, CPU: 00:00:11.029, GC gen0: 771, gen1: 3, gen2: 1
val it : unit = ()
using inline keyword on Eq_cmp
perf_run 10000000 ;;
Real: 00:00:01.319, CPU: 00:00:01.388, GC gen0: 1, gen1: 1, gen2: 1
val it : unit = ()
>
I also noticed the huge difference in the amount of Gen 0 GC with the inlined code and non inlined code.
Could someone explain why there is such a huge difference?
Simply put, the Y=f(x) equation calculates the dependent output of a process given different inputs.
What Does f ' Say About f ? The first derivative of a function is an expression which tells us the slope of a tangent line to the curve at any instant. Because of this definition, the first derivative of a function tells us much about the function. If is positive, then must be increasing.
f(f(x)) means replace each x in f(x) by the entire function f(x). For example, if f(x) = x3 + 13x + 9, then f(f(x)) = (x3 + 13x + 9)3 + 13(x3 + 13x + 9) + 9. However, f2(x) would mean (f(x))2 so, in the above example, f2(x) = (x3 + 13x + 9)2
I can reproduce the behavior on my machine with 3x performance boost after adding inline
keyword.
Decompiling two versions side by side under ILSpy gives almost identical C# code. The notable difference is in two equality tests:
// Version without inline
bool IEqualityComparer<Program.Pair<a>>.System-Collections-Generic-IEqualityComparer(Program.Pair<a> x, Program.Pair<a> y)
{
a v@ = x.v@;
a v@2 = y.v@;
if (LanguagePrimitives.HashCompare.GenericEqualityIntrinsic<a>(v@, v@2))
{
a w@ = x.w@;
a w@2 = y.w@;
return LanguagePrimitives.HashCompare.GenericEqualityIntrinsic<a>(w@, w@2);
}
return false;
}
// Version with inline
bool IEqualityComparer<Program.Pair<int>>.System-Collections-Generic-IEqualityComparer(Program.Pair<int> x, Program.Pair<int> y)
{
int v@ = x.v@;
int v@2 = y.v@;
if (v@ == v@2)
{
int w@ = x.w@;
int w@2 = y.w@;
return w@ == w@2;
}
return false;
}
The generic equality is much less efficient than the specialized version.
I also noticed the huge difference in the amount of Gen 0 GC with the inlined code and non inlined code.
Could someone explain why there is such a huge difference?
Taking a look at GenericEqualityIntrinsic
function in F# source code:
let rec GenericEqualityIntrinsic (x : 'T) (y : 'T) : bool =
fsEqualityComparer.Equals((box x), (box y))
It does boxing on arguments, which explains the significant amount of garbage in your first example. When GC comes into play too often, it will slow down the computation dramatically. The second example (using inline
) produces almost no garbage when Pair
is struct.
That said, it is the expected behavior of inline
keyword when a specialized version is used at the call site. My suggestion is always to try to optimize and measure your code on the same benchmarks.
You may be interested in a very similar thread Why is this F# code so slow?.
Type specialization
Without inline
, you are using generic comparison which is very inefficient. With inline
, the genericity is removed and int
comparison is used directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With