I've written a simple test, which creates a variable, initializes it with zero and increments 100000000 times.
C++ does it in 0.36 s. Original C# version in 0.33s New in 0.8s F# in 12 seconds.
I don't use any functions, so the problem is not with generics by default
F# code
open System open System.Diagnostics // Learn more about F# at http://fsharp.org // See the 'F# Tutorial' project for more help. [<EntryPoint>] let main argv = let N = 100000000 let mutable x = 0 let watch = new Stopwatch(); watch.Start(); for i in seq{1..N} do x <- (x+1) printfn "%A" x printfn "%A" watch.Elapsed Console.ReadLine() |> ignore 0 // return an integer exit code
C++ code
#include<stdio.h> #include<string.h> #include<vector> #include<iostream> #include<time.h> using namespace std; int main() { const int N = 100000000; int x = 0; double start = clock(); for(int i=0;i<N;++i) { x = x + 1; } printf("%d\n",x); printf("%.4lf\n",(clock() - start)/CLOCKS_PER_SEC); return 0; }
C# code
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Diagnostics; namespace SpeedTestCSharp { class Program { static void Main(string[] args) { const int N = 100000000; int x = 0; Stopwatch watch = new Stopwatch(); watch.Start(); foreach(int i in Enumerable.Range(0,N)) //Originally it was for(int i=0;i<N;++i) { x = x + 1; } Console.WriteLine(x); Console.WriteLine(watch.Elapsed); Console.ReadLine(); } } }
Edit
Replacing for (int i = 0; i < N; ++i)
with foreach(int i in Enumerable.Range(0,N))
makes C# program to run in about 0.8s, but it's still much faster than f#
Edit
Replaced DateTime
with StopWatch
the for F#/C#. Results are the same
The F-Test of overall significance in regression is a test of whether or not your linear regression model provides a better fit to a dataset than a model with no predictor variables.
An F-statistic is the ratio of two variances and it was named after Sir Ronald Fisher. Variances measure the dispersal of the data points around the mean. Higher variances occur when the individual data points tend to fall further from the mean.
The F-value in an ANOVA is calculated as: variation between sample means / variation within the samples. The higher the F-value in an ANOVA, the higher the variation between sample means relative to the variation within the samples. The higher the F-value, the lower the corresponding p-value.
This is very definitely happening directly as a consequence of using the expression:
for i in seq{1..N} do
On my machine, this gives the result:
100000000
00:00:09.1500924
If I change the loop to:
for i in 1..N do
The result changes dramatically:
100000000
00:00:00.1001864
Why?
The IL generated by these two approaches is quite different. The second case, using the 1..N
syntax simply gets compiled the same way as a C# for(int i=1; i<N+1; ++i)
loop.
The first case is quite different, this version produces a full sequence which is then enumerated by a foreach loop.
The C# and F# versions making use of IEnumerables
differ in that they use different range functions to generate them.
The C# version uses System.Linq.Enumerable.RangeIterator
to generate the value range, while the F# version uses Microsoft.FSharp.Core.Operators.OperatorIntrinsics.RangeInt32
. I think it's safe to assume that the performance difference we're seeing between the C# and F# versions in this particular case are a result of the performance characteristics of these two functions.
svick is correct to point out in his comment that the +
operator is actually being passed as an argument to the integralRangeStep
function.
For the non-trivial case where n <> m
this results in the F# compiler using a ProperIntegralRangeEnumerator
with the implementation found here: https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/FSharp.Core/prim-types.fs#L6463
let inline integralRangeStepEnumerator (zero,add,n,step,m,f) : IEnumerator<_> = // Generates sequence z_i where z_i = f (n + i.step) while n + i.step is in region (n,m) if n = m then new SingletonEnumerator<_> (f n) |> enumerator else let up = (n < m) let canStart = not (if up then step < zero else step > zero) // check for interval increasing, step decreasing // generate proper increasing sequence { new ProperIntegralRangeEnumerator<_,_>(n,m) with member x.CanStart = canStart member x.Before a b = if up then (a < b) else (a > b) member x.Equal a b = (a = b) member x.Step a = add a step member x.Result a = f a } |> enumerator
We can see that stepping through the Enumerator results in calls to the supplied add
function rather than a more straightforward, direct addition.
Note: All timings run in Release mode (Tail Calls: On, Optimisation: On).
I don't know F# very well so I wanted to take a peek at the code it produces. Here's the result. It just confirms TheInnerLight's answer.
First, C++ should be able to optimize your for
loop away, you'll get a zero (or near-zero) time. The .NET compilers and JIT currently don't perform this optimization, so let's compare them.
Here's the IL of the C# loop:
// [21 28 - 21 58] IL_000e: ldc.i4.0 IL_000f: ldc.i4 100000000 IL_0014: call class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [System.Core]System.Linq.Enumerable::Range(int32, int32) IL_0019: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0/*int32*/> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_001e: stloc.2 // V_2 .try { IL_001f: br.s IL_002c // [21 16 - 21 24] IL_0021: ldloc.2 // V_2 IL_0022: callvirt instance !0/*int32*/ class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current() IL_0027: pop // [22 9 - 22 15] IL_0028: ldloc.0 // num1 IL_0029: ldc.i4.1 IL_002a: add IL_002b: stloc.0 // num1 IL_002c: ldloc.2 // V_2 IL_002d: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext() IL_0032: brtrue.s IL_0021 IL_0034: leave.s IL_0040 } // end of .try finally { IL_0036: ldloc.2 // V_2 IL_0037: brfalse.s IL_003f IL_0039: ldloc.2 // V_2 IL_003a: callvirt instance void [mscorlib]System.IDisposable::Dispose() IL_003f: endfinally } // end of finally
And here's the IL of the F# loop:
// [23 5 - 23 138] IL_000f: ldc.i4.1 IL_0010: ldc.i4.1 IL_0011: ldc.i4 100000000 IL_0016: call class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [FSharp.Core]Microsoft.FSharp.Core.Operators/OperatorIntrinsics::RangeInt32(int32, int32, int32) IL_001b: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0/*int32*/> [FSharp.Core]Microsoft.FSharp.Core.Operators::CreateSequence<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0/*int32*/>) IL_0020: stloc.2 // V_2 IL_0021: ldloc.2 // V_2 IL_0022: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0/*int32*/> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_0027: stloc.3 // enumerator .try { // [26 7 - 26 36] IL_0028: ldloc.3 // enumerator IL_0029: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext() IL_002e: brfalse.s IL_003f // [28 9 - 28 41] IL_0030: ldloc.3 // enumerator IL_0031: callvirt instance !0/*int32*/ class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current() IL_0036: stloc.s current // [29 9 - 29 15] IL_0038: ldloc.0 // func IL_0039: ldc.i4.1 IL_003a: add IL_003b: stloc.0 // func IL_003c: nop IL_003d: br.s IL_0028 IL_003f: ldnull IL_0040: stloc.s V_4 IL_0042: leave.s IL_005d } // end of .try finally { // [34 7 - 34 57] IL_0044: ldloc.3 // enumerator IL_0045: isinst [mscorlib]System.IDisposable IL_004a: stloc.s disposable // [35 7 - 35 30] IL_004c: ldloc.s disposable IL_004e: brfalse.s IL_005a // [36 9 - 36 29] IL_0050: ldloc.s disposable IL_0052: callvirt instance void [mscorlib]System.IDisposable::Dispose() IL_0057: ldnull IL_0058: pop IL_0059: endfinally IL_005a: ldnull IL_005b: pop IL_005c: endfinally } // end of finally IL_005d: ldloc.s V_4 IL_005f: pop
So, while the loops are a bit different, they mainly do the same thing.
Here's what C# does:
MoveNext
part (just once)Current
property of the enumerable, and discard it 0
MoveNext
true
, or exit the loop on false
The F# loop does the following:
MoveNext
false
Current
property of the enumerable, and store its value in a local 0
nop
(sic) So we have two differences here:
Current
property's value while F# stores it in a localnop
(do nothing) instruction in the loop for some reason that's beyond me (and yes, this is Release mode).But these differences alone don't explain the huge performance impact. Let's take a look at what the JIT does with this.
Note: rcx
is the first argument in the x64 calling convention used, which corresponds to the this
implicit parameter in instance method calls.
C#, x64:
foreach (int i in Enumerable.Range(0, N)) 00007FFCF2B94514 xor ecx,ecx 00007FFCF2B94516 mov edx,5F5E100h 00007FFCF2B9451B call 00007FFD50EF08F0 // Call Enumerable.Range 00007FFCF2B94520 mov rcx,rax 00007FFCF2B94523 mov r11,7FFCF2A80040h 00007FFCF2B9452D cmp dword ptr [rcx],ecx 00007FFCF2B9452F call qword ptr [r11] // Call GetEnumerator 00007FFCF2B94532 mov qword ptr [rbp-20h],rax 00007FFCF2B94536 mov rcx,qword ptr [rbp-20h] // Store the IEnumerator in rcx 00007FFCF2B9453A mov r11,7FFCF2A80048h 00007FFCF2B94544 cmp dword ptr [rcx],ecx 00007FFCF2B94546 call qword ptr [r11] // Call MoveNext 00007FFCF2B94549 test al,al 00007FFCF2B9454B je 00007FFCF2B9457F // Skip the loop 00007FFCF2B9454D mov rcx,qword ptr [rbp-20h] // Store the IEnumerator in rcx 00007FFCF2B94551 mov r11,7FFCF2A80050h 00007FFCF2B9455B cmp dword ptr [rcx],ecx 00007FFCF2B9455D call qword ptr [r11] // Call get_Current { x = x + 1; 00007FFCF2B94560 mov ecx,dword ptr [rbp-0Ch] 00007FFCF2B94563 inc ecx 00007FFCF2B94565 mov dword ptr [rbp-0Ch],ecx foreach (int i in Enumerable.Range(0, N)) 00007FFCF2B94568 mov rcx,qword ptr [rbp-20h] // Store the IEnumerator in rcx 00007FFCF2B9456C mov r11,7FFCF2A80048h 00007FFCF2B94576 cmp dword ptr [rcx],ecx 00007FFCF2B94578 call qword ptr [r11] // Call MoveNext 00007FFCF2B9457B test al,al 00007FFCF2B9457D jne 00007FFCF2B9454D 00007FFCF2B9457F mov rcx,qword ptr [rsp+20h] 00007FFCF2B94584 call 00007FFCF2B945C6 00007FFCF2B94589 nop }
F#, x64:
for i in seq{1..N} do 00007FFCF2B904F4 mov ecx,1 00007FFCF2B904F9 mov edx,1 00007FFCF2B904FE mov r8d,5F5E100h 00007FFCF2B90504 call 00007FFD42AA2B80 // Create the sequence 00007FFCF2B90509 mov rcx,rax 00007FFCF2B9050C mov r11,7FFCF2A90020h 00007FFCF2B90516 cmp dword ptr [rcx],ecx 00007FFCF2B90518 call qword ptr [r11] // Call GetEnumerator 00007FFCF2B9051B mov qword ptr [rbp-20h],rax 00007FFCF2B9051F mov rcx,qword ptr [rbp-20h] // Store the IEnumerator in rcx 00007FFCF2B90523 mov r11,7FFCF2A90028h 00007FFCF2B9052D cmp dword ptr [rcx],ecx 00007FFCF2B9052F call qword ptr [r11] // Call MoveNext 00007FFCF2B90532 test al,al 00007FFCF2B90534 je 00007FFCF2B90553 // Exit the loop? x <- (x+1) 00007FFCF2B90536 mov rcx,qword ptr [rbp-20h] 00007FFCF2B9053A mov r11,7FFCF2A90030h 00007FFCF2B90544 cmp dword ptr [rcx],ecx 00007FFCF2B90546 call qword ptr [r11] // Call get_Current 00007FFCF2B90549 mov edx,dword ptr [rbp-0Ch] 00007FFCF2B9054C inc edx 00007FFCF2B9054E mov dword ptr [rbp-0Ch],edx 00007FFCF2B90551 jmp 00007FFCF2B9051F // Loop 00007FFCF2B90553 mov rcx,qword ptr [rsp+20h] 00007FFCF2B90558 call 00007FFCF2B9061C 00007FFCF2B9055D nop
First, we notice that C# still calls Current
even though it discards its result. This is a virtual call, which didn't get optimized away.
Oh and that F# nop
IL opcode is optimized away by the JIT. There is a nop
in the x64 code, but it's after the loop, and it's certainly here for purposes of alignment.
Then, we can see the code is very similar in the two cases, although it's structured a bit differently. It calls the same functions and doesn't do anything weird.
So yes, the performance difference you're seeing is certainly explained by the way F# constructs its sequence, not by its looping mechanism itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With