Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this simple F# code 36 times slower than C#/C++ versions?

I've written a simple test, which creates a variable, initializes it with zero and increments 100000000 times.

C++ does it in 0.36 s. Original C# version in 0.33s New in 0.8s F# in 12 seconds.

I don't use any functions, so the problem is not with generics by default

F# code

open System open System.Diagnostics // Learn more about F# at http://fsharp.org // See the 'F# Tutorial' project for more help. [<EntryPoint>] let main argv =      let N = 100000000     let mutable x = 0     let watch = new Stopwatch();     watch.Start();     for i in seq{1..N} do         x <- (x+1)     printfn "%A" x     printfn "%A" watch.Elapsed     Console.ReadLine()         |> ignore     0 // return an integer exit code 

C++ code

#include<stdio.h> #include<string.h> #include<vector> #include<iostream> #include<time.h> using namespace std; int main() {     const int N = 100000000;     int x = 0;     double start = clock();     for(int i=0;i<N;++i)     {         x = x + 1;     }     printf("%d\n",x);     printf("%.4lf\n",(clock() - start)/CLOCKS_PER_SEC);     return 0; } 

C# code

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Diagnostics;  namespace SpeedTestCSharp {     class Program     {         static void Main(string[] args)         {             const int N = 100000000;             int x = 0;             Stopwatch watch = new Stopwatch();             watch.Start();              foreach(int i in Enumerable.Range(0,N))             //Originally it was for(int i=0;i<N;++i)             {                 x = x + 1;             }             Console.WriteLine(x);             Console.WriteLine(watch.Elapsed);             Console.ReadLine();         }     } } 

Edit

Replacing for (int i = 0; i < N; ++i) with foreach(int i in Enumerable.Range(0,N)) makes C# program to run in about 0.8s, but it's still much faster than f#

Edit

Replaced DateTime with StopWatch the for F#/C#. Results are the same

like image 846
user2136963 Avatar asked Feb 13 '16 09:02

user2136963


People also ask

What is the significance F in regression?

The F-Test of overall significance in regression is a test of whether or not your linear regression model provides a better fit to a dataset than a model with no predictor variables.

What does F mean in statistics?

An F-statistic is the ratio of two variances and it was named after Sir Ronald Fisher. Variances measure the dispersal of the data points around the mean. Higher variances occur when the individual data points tend to fall further from the mean.

What is a significant F value in ANOVA?

The F-value in an ANOVA is calculated as: variation between sample means / variation within the samples. The higher the F-value in an ANOVA, the higher the variation between sample means relative to the variation within the samples. The higher the F-value, the lower the corresponding p-value.


2 Answers

This is very definitely happening directly as a consequence of using the expression:

for i in seq{1..N} do 

On my machine, this gives the result:

100000000

00:00:09.1500924

If I change the loop to:

for i in 1..N do 

The result changes dramatically:

100000000

00:00:00.1001864

Why?

The IL generated by these two approaches is quite different. The second case, using the 1..N syntax simply gets compiled the same way as a C# for(int i=1; i<N+1; ++i) loop.

The first case is quite different, this version produces a full sequence which is then enumerated by a foreach loop.

The C# and F# versions making use of IEnumerables differ in that they use different range functions to generate them.

The C# version uses System.Linq.Enumerable.RangeIterator to generate the value range, while the F# version uses Microsoft.FSharp.Core.Operators.OperatorIntrinsics.RangeInt32. I think it's safe to assume that the performance difference we're seeing between the C# and F# versions in this particular case are a result of the performance characteristics of these two functions.

svick is correct to point out in his comment that the + operator is actually being passed as an argument to the integralRangeStep function.

For the non-trivial case where n <> m this results in the F# compiler using a ProperIntegralRangeEnumerator with the implementation found here: https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/FSharp.Core/prim-types.fs#L6463

let inline integralRangeStepEnumerator (zero,add,n,step,m,f) : IEnumerator<_> =     // Generates sequence z_i where z_i = f (n + i.step) while n + i.step is in region (n,m)     if n = m then         new SingletonEnumerator<_> (f n) |> enumerator      else         let up = (n < m)         let canStart = not (if up then step < zero else step > zero) // check for interval increasing, step decreasing          // generate proper increasing sequence         { new ProperIntegralRangeEnumerator<_,_>(n,m) with                  member x.CanStart = canStart                 member x.Before a b = if up then (a < b) else (a > b)                 member x.Equal a b = (a = b)                 member x.Step a = add a step                 member x.Result a = f a } |> enumerator 

We can see that stepping through the Enumerator results in calls to the supplied add function rather than a more straightforward, direct addition.

Note: All timings run in Release mode (Tail Calls: On, Optimisation: On).

like image 66
TheInnerLight Avatar answered Sep 19 '22 01:09

TheInnerLight


I don't know F# very well so I wanted to take a peek at the code it produces. Here's the result. It just confirms TheInnerLight's answer.

First, C++ should be able to optimize your for loop away, you'll get a zero (or near-zero) time. The .NET compilers and JIT currently don't perform this optimization, so let's compare them.

Here's the IL of the C# loop:

// [21 28 - 21 58] IL_000e: ldc.i4.0      IL_000f: ldc.i4       100000000 IL_0014: call         class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [System.Core]System.Linq.Enumerable::Range(int32, int32) IL_0019: callvirt     instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0/*int32*/> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_001e: stloc.2      // V_2 .try {    IL_001f: br.s         IL_002c  // [21 16 - 21 24]   IL_0021: ldloc.2      // V_2   IL_0022: callvirt     instance !0/*int32*/ class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()   IL_0027: pop            // [22 9 - 22 15]   IL_0028: ldloc.0      // num1   IL_0029: ldc.i4.1        IL_002a: add             IL_002b: stloc.0      // num1    IL_002c: ldloc.2      // V_2   IL_002d: callvirt     instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()   IL_0032: brtrue.s     IL_0021   IL_0034: leave.s      IL_0040 } // end of .try finally {   IL_0036: ldloc.2      // V_2   IL_0037: brfalse.s    IL_003f   IL_0039: ldloc.2      // V_2   IL_003a: callvirt     instance void [mscorlib]System.IDisposable::Dispose()   IL_003f: endfinally    } // end of finally 

And here's the IL of the F# loop:

// [23 5 - 23 138] IL_000f: ldc.i4.1      IL_0010: ldc.i4.1      IL_0011: ldc.i4       100000000 IL_0016: call         class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [FSharp.Core]Microsoft.FSharp.Core.Operators/OperatorIntrinsics::RangeInt32(int32, int32, int32) IL_001b: call         class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0/*int32*/> [FSharp.Core]Microsoft.FSharp.Core.Operators::CreateSequence<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0/*int32*/>) IL_0020: stloc.2      // V_2 IL_0021: ldloc.2      // V_2 IL_0022: callvirt     instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0/*int32*/> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_0027: stloc.3      // enumerator .try {  // [26 7 - 26 36]   IL_0028: ldloc.3      // enumerator   IL_0029: callvirt     instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()   IL_002e: brfalse.s    IL_003f  // [28 9 - 28 41]   IL_0030: ldloc.3      // enumerator   IL_0031: callvirt     instance !0/*int32*/ class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()   IL_0036: stloc.s      current  // [29 9 - 29 15]   IL_0038: ldloc.0      // func   IL_0039: ldc.i4.1        IL_003a: add             IL_003b: stloc.0      // func   IL_003c: nop              IL_003d: br.s         IL_0028   IL_003f: ldnull          IL_0040: stloc.s      V_4   IL_0042: leave.s      IL_005d } // end of .try finally {  // [34 7 - 34 57]   IL_0044: ldloc.3      // enumerator   IL_0045: isinst       [mscorlib]System.IDisposable   IL_004a: stloc.s      disposable  // [35 7 - 35 30]   IL_004c: ldloc.s      disposable   IL_004e: brfalse.s    IL_005a  // [36 9 - 36 29]   IL_0050: ldloc.s      disposable   IL_0052: callvirt     instance void [mscorlib]System.IDisposable::Dispose()    IL_0057: ldnull          IL_0058: pop             IL_0059: endfinally      IL_005a: ldnull          IL_005b: pop             IL_005c: endfinally    } // end of finally IL_005d: ldloc.s      V_4 IL_005f: pop           

So, while the loops are a bit different, they mainly do the same thing.

Here's what C# does:

  • [0] Branch to the MoveNext part (just once)
  • [1] Get the Current property of the enumerable, and discard it
  • [2] Add 1 to the local 0
  • [3] Call MoveNext
  • [4] Go back to [1] on true, or exit the loop on false

The F# loop does the following:

  • [0] Call MoveNext
  • [1] Leave the loop on false
  • [2] Get the Current property of the enumerable, and store its value in a local
  • [3] Add 1 to the local 0
  • [4] Take a break with nop (sic)
  • [5] Branch to [0]

So we have two differences here:

  • C# discards the Current property's value while F# stores it in a local
  • F# has a nop (do nothing) instruction in the loop for some reason that's beyond me (and yes, this is Release mode).

But these differences alone don't explain the huge performance impact. Let's take a look at what the JIT does with this.

Note: rcx is the first argument in the x64 calling convention used, which corresponds to the this implicit parameter in instance method calls.

C#, x64:

            foreach (int i in Enumerable.Range(0, N)) 00007FFCF2B94514  xor         ecx,ecx   00007FFCF2B94516  mov         edx,5F5E100h   00007FFCF2B9451B  call        00007FFD50EF08F0          // Call Enumerable.Range 00007FFCF2B94520  mov         rcx,rax   00007FFCF2B94523  mov         r11,7FFCF2A80040h 00007FFCF2B9452D  cmp         dword ptr [rcx],ecx   00007FFCF2B9452F  call        qword ptr [r11]           // Call GetEnumerator 00007FFCF2B94532  mov         qword ptr [rbp-20h],rax   00007FFCF2B94536  mov         rcx,qword ptr [rbp-20h]   // Store the IEnumerator in rcx 00007FFCF2B9453A  mov         r11,7FFCF2A80048h         00007FFCF2B94544  cmp         dword ptr [rcx],ecx   00007FFCF2B94546  call        qword ptr [r11]           // Call MoveNext 00007FFCF2B94549  test        al,al   00007FFCF2B9454B  je          00007FFCF2B9457F          // Skip the loop 00007FFCF2B9454D  mov         rcx,qword ptr [rbp-20h]   // Store the IEnumerator in rcx 00007FFCF2B94551  mov         r11,7FFCF2A80050h   00007FFCF2B9455B  cmp         dword ptr [rcx],ecx   00007FFCF2B9455D  call        qword ptr [r11]           // Call get_Current             {                 x = x + 1; 00007FFCF2B94560  mov         ecx,dword ptr [rbp-0Ch]   00007FFCF2B94563  inc         ecx                        00007FFCF2B94565  mov         dword ptr [rbp-0Ch],ecx               foreach (int i in Enumerable.Range(0, N)) 00007FFCF2B94568  mov         rcx,qword ptr [rbp-20h]   // Store the IEnumerator in rcx 00007FFCF2B9456C  mov         r11,7FFCF2A80048h   00007FFCF2B94576  cmp         dword ptr [rcx],ecx   00007FFCF2B94578  call        qword ptr [r11]           // Call MoveNext 00007FFCF2B9457B  test        al,al   00007FFCF2B9457D  jne         00007FFCF2B9454D   00007FFCF2B9457F  mov         rcx,qword ptr [rsp+20h]   00007FFCF2B94584  call        00007FFCF2B945C6   00007FFCF2B94589  nop               } 

F#, x64:

    for i in seq{1..N} do 00007FFCF2B904F4  mov         ecx,1   00007FFCF2B904F9  mov         edx,1   00007FFCF2B904FE  mov         r8d,5F5E100h   00007FFCF2B90504  call        00007FFD42AA2B80          // Create the sequence 00007FFCF2B90509  mov         rcx,rax   00007FFCF2B9050C  mov         r11,7FFCF2A90020h   00007FFCF2B90516  cmp         dword ptr [rcx],ecx   00007FFCF2B90518  call        qword ptr [r11]           // Call GetEnumerator 00007FFCF2B9051B  mov         qword ptr [rbp-20h],rax   00007FFCF2B9051F  mov         rcx,qword ptr [rbp-20h]   // Store the IEnumerator in rcx 00007FFCF2B90523  mov         r11,7FFCF2A90028h   00007FFCF2B9052D  cmp         dword ptr [rcx],ecx   00007FFCF2B9052F  call        qword ptr [r11]           // Call MoveNext   00007FFCF2B90532  test        al,al   00007FFCF2B90534  je          00007FFCF2B90553          // Exit the loop?         x <- (x+1) 00007FFCF2B90536  mov         rcx,qword ptr [rbp-20h]   00007FFCF2B9053A  mov         r11,7FFCF2A90030h   00007FFCF2B90544  cmp         dword ptr [rcx],ecx   00007FFCF2B90546  call        qword ptr [r11]           // Call get_Current 00007FFCF2B90549  mov         edx,dword ptr [rbp-0Ch]   00007FFCF2B9054C  inc         edx   00007FFCF2B9054E  mov         dword ptr [rbp-0Ch],edx   00007FFCF2B90551  jmp         00007FFCF2B9051F          // Loop 00007FFCF2B90553  mov         rcx,qword ptr [rsp+20h]   00007FFCF2B90558  call        00007FFCF2B9061C   00007FFCF2B9055D  nop    

First, we notice that C# still calls Current even though it discards its result. This is a virtual call, which didn't get optimized away.

Oh and that F# nop IL opcode is optimized away by the JIT. There is a nop in the x64 code, but it's after the loop, and it's certainly here for purposes of alignment.

Then, we can see the code is very similar in the two cases, although it's structured a bit differently. It calls the same functions and doesn't do anything weird.

So yes, the performance difference you're seeing is certainly explained by the way F# constructs its sequence, not by its looping mechanism itself.

like image 38
Lucas Trzesniewski Avatar answered Sep 18 '22 01:09

Lucas Trzesniewski