I've written a simple test, which creates a variable, initializes it with zero and increments 100000000 times. C++ does it in 0.36 s. Original C# version in 0.33s New in 0.8s F# in 12 seconds. I don't use any functions, so the problem is not with generics by default F# code <pre class="prettyprint"><code>open System open System.Diagnostics // Learn more about F# at http://fsharp.org // See the 'F# Tutorial' project for more help. [<EntryPoint>] let main argv = let N = 100000000 let mutable x = 0 let watch = new Stopwatch(); watch.Start(); for i in seq{1..N} do x <- (x+1) printfn "%A" x printfn "%A" watch.Elapsed Console.ReadLine() |> ignore 0 // return an integer exit code </code></pre> C++ code <pre class="prettyprint"><code>#include<stdio.h> #include<string.h> #include<vector> #include<iostream> #include<time.h> using namespace std; int main() { const int N = 100000000; int x = 0; double start = clock(); for(int i=0;i<N;++i) { x = x + 1; } printf("%d\n",x); printf("%.4lf\n",(clock() - start)/CLOCKS_PER_SEC); return 0; } </code></pre> C# code <pre class="prettyprint"><code>using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Diagnostics; namespace SpeedTestCSharp { class Program { static void Main(string[] args) { const int N = 100000000; int x = 0; Stopwatch watch = new Stopwatch(); watch.Start(); foreach(int i in Enumerable.Range(0,N)) //Originally it was for(int i=0;i<N;++i) { x = x + 1; } Console.WriteLine(x); Console.WriteLine(watch.Elapsed); Console.ReadLine(); } } } </code></pre> Edit Replacing <code>for (int i = 0; i < N; ++i)</code> with <code>foreach(int i in Enumerable.Range(0,N))</code> makes C# program to run in about 0.8s, but it's still much faster than f# Edit Replaced <code>DateTime</code> with <code>StopWatch</code> the for F#/C#. Results are the same

This is very definitely happening directly as a consequence of using the expression: <pre class="prettyprint"><code>for i in seq{1..N} do </code></pre> On my machine, this gives the result: <blockquote> 100000000 00:00:09.1500924 </blockquote> If I change the loop to: <pre class="prettyprint"><code>for i in 1..N do </code></pre> The result changes dramatically: <blockquote> 100000000 00:00:00.1001864 </blockquote> Why? The IL generated by these two approaches is quite different. The second case, using the <code>1..N</code> syntax simply gets compiled the same way as a C# <code>for(int i=1; i<N+1; ++i)</code> loop. The first case is quite different, this version produces a full sequence which is then enumerated by a foreach loop. The C# and F# versions making use of <code>IEnumerables</code> differ in that they use different range functions to generate them. The C# version uses <code>System.Linq.Enumerable.RangeIterator</code> to generate the value range, while the F# version uses <code>Microsoft.FSharp.Core.Operators.OperatorIntrinsics.RangeInt32</code>. I think it's safe to assume that the performance difference we're seeing between the C# and F# versions in this particular case are a result of the performance characteristics of these two functions. svick is correct to point out in his comment that the <code>+</code> operator is actually being passed as an argument to the <code>integralRangeStep</code> function. For the non-trivial case where <code>n <> m</code> this results in the F# compiler using a <code>ProperIntegralRangeEnumerator</code> with the implementation found here: https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/FSharp.Core/prim-types.fs#L6463 <pre class="prettyprint"><code>let inline integralRangeStepEnumerator (zero,add,n,step,m,f) : IEnumerator<_> = // Generates sequence z_i where z_i = f (n + i.step) while n + i.step is in region (n,m) if n = m then new SingletonEnumerator<_> (f n) |> enumerator else let up = (n < m) let canStart = not (if up then step < zero else step > zero) // check for interval increasing, step decreasing // generate proper increasing sequence { new ProperIntegralRangeEnumerator<_,_>(n,m) with member x.CanStart = canStart member x.Before a b = if up then (a < b) else (a > b) member x.Equal a b = (a = b) member x.Step a = add a step member x.Result a = f a } |> enumerator </code></pre> We can see that stepping through the Enumerator results in calls to the supplied <code>add</code> function rather than a more straightforward, direct addition. Note: All timings run in Release mode (Tail Calls: On, Optimisation: On).

I don't know F# very well so I wanted to take a peek at the code it produces. Here's the result. It just confirms TheInnerLight's answer. First, C++ should be able to optimize your <code>for</code> loop away, you'll get a zero (or near-zero) time. The .NET compilers and JIT currently don't perform this optimization, so let's compare them. Here's the IL of the C# loop: <pre class="prettyprint lang-none prettyprint-override"><code>// [21 28 - 21 58] IL_000e: ldc.i4.0 IL_000f: ldc.i4 100000000 IL_0014: call class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [System.Core]System.Linq.Enumerable::Range(int32, int32) IL_0019: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0/*int32*/> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_001e: stloc.2 // V_2 .try { IL_001f: br.s IL_002c // [21 16 - 21 24] IL_0021: ldloc.2 // V_2 IL_0022: callvirt instance !0/*int32*/ class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current() IL_0027: pop // [22 9 - 22 15] IL_0028: ldloc.0 // num1 IL_0029: ldc.i4.1 IL_002a: add IL_002b: stloc.0 // num1 IL_002c: ldloc.2 // V_2 IL_002d: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext() IL_0032: brtrue.s IL_0021 IL_0034: leave.s IL_0040 } // end of .try finally { IL_0036: ldloc.2 // V_2 IL_0037: brfalse.s IL_003f IL_0039: ldloc.2 // V_2 IL_003a: callvirt instance void [mscorlib]System.IDisposable::Dispose() IL_003f: endfinally } // end of finally </code></pre> And here's the IL of the F# loop: <pre class="prettyprint lang-none prettyprint-override"><code>// [23 5 - 23 138] IL_000f: ldc.i4.1 IL_0010: ldc.i4.1 IL_0011: ldc.i4 100000000 IL_0016: call class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [FSharp.Core]Microsoft.FSharp.Core.Operators/OperatorIntrinsics::RangeInt32(int32, int32, int32) IL_001b: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0/*int32*/> [FSharp.Core]Microsoft.FSharp.Core.Operators::CreateSequence<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0/*int32*/>) IL_0020: stloc.2 // V_2 IL_0021: ldloc.2 // V_2 IL_0022: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0/*int32*/> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_0027: stloc.3 // enumerator .try { // [26 7 - 26 36] IL_0028: ldloc.3 // enumerator IL_0029: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext() IL_002e: brfalse.s IL_003f // [28 9 - 28 41] IL_0030: ldloc.3 // enumerator IL_0031: callvirt instance !0/*int32*/ class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current() IL_0036: stloc.s current // [29 9 - 29 15] IL_0038: ldloc.0 // func IL_0039: ldc.i4.1 IL_003a: add IL_003b: stloc.0 // func IL_003c: nop IL_003d: br.s IL_0028 IL_003f: ldnull IL_0040: stloc.s V_4 IL_0042: leave.s IL_005d } // end of .try finally { // [34 7 - 34 57] IL_0044: ldloc.3 // enumerator IL_0045: isinst [mscorlib]System.IDisposable IL_004a: stloc.s disposable // [35 7 - 35 30] IL_004c: ldloc.s disposable IL_004e: brfalse.s IL_005a // [36 9 - 36 29] IL_0050: ldloc.s disposable IL_0052: callvirt instance void [mscorlib]System.IDisposable::Dispose() IL_0057: ldnull IL_0058: pop IL_0059: endfinally IL_005a: ldnull IL_005b: pop IL_005c: endfinally } // end of finally IL_005d: ldloc.s V_4 IL_005f: pop </code></pre> So, while the loops are a bit different, they mainly do the same thing. Here's what C# does: <ul> <li>[0] Branch to the <code>MoveNext</code> part (just once)</li> <li>[1] Get the <code>Current</code> property of the enumerable, and discard it </li> <li>[2] Add 1 to the local <code>0</code> </li> <li>[3] Call <code>MoveNext</code> </li> <li>[4] Go back to [1] on <code>true</code>, or exit the loop on <code>false</code> </li> </ul> The F# loop does the following: <ul> <li>[0] Call <code>MoveNext</code> </li> <li>[1] Leave the loop on <code>false</code> </li> <li>[2] Get the <code>Current</code> property of the enumerable, and store its value in a local </li> <li>[3] Add 1 to the local <code>0</code> </li> <li>[4] Take a break with <code>nop</code> (sic) </li> <li>[5] Branch to [0]</li> </ul> So we have two differences here: <ul> <li>C# discards the <code>Current</code> property's value while F# stores it in a local</li> <li>F# has a <code>nop</code> (do nothing) instruction in the loop for some reason that's beyond me (and yes, this is Release mode).</li> </ul> But these differences alone don't explain the huge performance impact. Let's take a look at what the JIT does with this. Note: <code>rcx</code> is the first argument in the x64 calling convention used, which corresponds to the <code>this</code> implicit parameter in instance method calls. C#, x64: <pre class="prettyprint lang-none prettyprint-override"><code> foreach (int i in Enumerable.Range(0, N)) 00007FFCF2B94514 xor ecx,ecx 00007FFCF2B94516 mov edx,5F5E100h 00007FFCF2B9451B call 00007FFD50EF08F0 // Call Enumerable.Range 00007FFCF2B94520 mov rcx,rax 00007FFCF2B94523 mov r11,7FFCF2A80040h 00007FFCF2B9452D cmp dword ptr [rcx],ecx 00007FFCF2B9452F call qword ptr [r11] // Call GetEnumerator 00007FFCF2B94532 mov qword ptr [rbp-20h],rax 00007FFCF2B94536 mov rcx,qword ptr [rbp-20h] // Store the IEnumerator in rcx 00007FFCF2B9453A mov r11,7FFCF2A80048h 00007FFCF2B94544 cmp dword ptr [rcx],ecx 00007FFCF2B94546 call qword ptr [r11] // Call MoveNext 00007FFCF2B94549 test al,al 00007FFCF2B9454B je 00007FFCF2B9457F // Skip the loop 00007FFCF2B9454D mov rcx,qword ptr [rbp-20h] // Store the IEnumerator in rcx 00007FFCF2B94551 mov r11,7FFCF2A80050h 00007FFCF2B9455B cmp dword ptr [rcx],ecx 00007FFCF2B9455D call qword ptr [r11] // Call get_Current { x = x + 1; 00007FFCF2B94560 mov ecx,dword ptr [rbp-0Ch] 00007FFCF2B94563 inc ecx 00007FFCF2B94565 mov dword ptr [rbp-0Ch],ecx foreach (int i in Enumerable.Range(0, N)) 00007FFCF2B94568 mov rcx,qword ptr [rbp-20h] // Store the IEnumerator in rcx 00007FFCF2B9456C mov r11,7FFCF2A80048h 00007FFCF2B94576 cmp dword ptr [rcx],ecx 00007FFCF2B94578 call qword ptr [r11] // Call MoveNext 00007FFCF2B9457B test al,al 00007FFCF2B9457D jne 00007FFCF2B9454D 00007FFCF2B9457F mov rcx,qword ptr [rsp+20h] 00007FFCF2B94584 call 00007FFCF2B945C6 00007FFCF2B94589 nop } </code></pre> F#, x64: <pre class="prettyprint lang-none prettyprint-override"><code> for i in seq{1..N} do 00007FFCF2B904F4 mov ecx,1 00007FFCF2B904F9 mov edx,1 00007FFCF2B904FE mov r8d,5F5E100h 00007FFCF2B90504 call 00007FFD42AA2B80 // Create the sequence 00007FFCF2B90509 mov rcx,rax 00007FFCF2B9050C mov r11,7FFCF2A90020h 00007FFCF2B90516 cmp dword ptr [rcx],ecx 00007FFCF2B90518 call qword ptr [r11] // Call GetEnumerator 00007FFCF2B9051B mov qword ptr [rbp-20h],rax 00007FFCF2B9051F mov rcx,qword ptr [rbp-20h] // Store the IEnumerator in rcx 00007FFCF2B90523 mov r11,7FFCF2A90028h 00007FFCF2B9052D cmp dword ptr [rcx],ecx 00007FFCF2B9052F call qword ptr [r11] // Call MoveNext 00007FFCF2B90532 test al,al 00007FFCF2B90534 je 00007FFCF2B90553 // Exit the loop? x <- (x+1) 00007FFCF2B90536 mov rcx,qword ptr [rbp-20h] 00007FFCF2B9053A mov r11,7FFCF2A90030h 00007FFCF2B90544 cmp dword ptr [rcx],ecx 00007FFCF2B90546 call qword ptr [r11] // Call get_Current 00007FFCF2B90549 mov edx,dword ptr [rbp-0Ch] 00007FFCF2B9054C inc edx 00007FFCF2B9054E mov dword ptr [rbp-0Ch],edx 00007FFCF2B90551 jmp 00007FFCF2B9051F // Loop 00007FFCF2B90553 mov rcx,qword ptr [rsp+20h] 00007FFCF2B90558 call 00007FFCF2B9061C 00007FFCF2B9055D nop </code></pre> First, we notice that C# still calls <code>Current</code> even though it discards its result. This is a virtual call, which didn't get optimized away. Oh and that F# <code>nop</code> IL opcode is optimized away by the JIT. There is a <code>nop</code> in the x64 code, but it's after the loop, and it's certainly here for purposes of alignment. Then, we can see the code is very similar in the two cases, although it's structured a bit differently. It calls the same functions and doesn't do anything weird. So yes, the performance difference you're seeing is certainly explained by the way F# constructs its sequence, not by its looping mechanism itself.

Why is this simple F# code 36 times slower than C#/C++ versions?

Q: What is the significance F in regression?

The F-Test of overall significance in regression is a test of whether or not your linear regression model provides a better fit to a dataset than a model with no predictor variables.

Q: What does F mean in statistics?

An F-statistic is the ratio of two variances and it was named after Sir Ronald Fisher. Variances measure the dispersal of the data points around the mean. Higher variances occur when the individual data points tend to fall further from the mean.

Q: What is a significant F value in ANOVA?

The F-value in an ANOVA is calculated as: variation between sample means / variation within the samples. The higher the F-value in an ANOVA, the higher the variation between sample means relative to the variation within the samples. The higher the F-value, the lower the corresponding p-value.

Tags:

c++

performance

c#

f#

I've written a simple test, which creates a variable, initializes it with zero and increments 100000000 times.

C++ does it in 0.36 s. Original C# version in 0.33s New in 0.8s F# in 12 seconds.

I don't use any functions, so the problem is not with generics by default

F# code

open System open System.Diagnostics // Learn more about F# at http://fsharp.org // See the 'F# Tutorial' project for more help. [<EntryPoint>] let main argv =      let N = 100000000     let mutable x = 0     let watch = new Stopwatch();     watch.Start();     for i in seq{1..N} do         x <- (x+1)     printfn "%A" x     printfn "%A" watch.Elapsed     Console.ReadLine()         |> ignore     0 // return an integer exit code

C++ code

#include<stdio.h> #include<string.h> #include<vector> #include<iostream> #include<time.h> using namespace std; int main() {     const int N = 100000000;     int x = 0;     double start = clock();     for(int i=0;i<N;++i)     {         x = x + 1;     }     printf("%d\n",x);     printf("%.4lf\n",(clock() - start)/CLOCKS_PER_SEC);     return 0; }

C# code

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Diagnostics;  namespace SpeedTestCSharp {     class Program     {         static void Main(string[] args)         {             const int N = 100000000;             int x = 0;             Stopwatch watch = new Stopwatch();             watch.Start();              foreach(int i in Enumerable.Range(0,N))             //Originally it was for(int i=0;i<N;++i)             {                 x = x + 1;             }             Console.WriteLine(x);             Console.WriteLine(watch.Elapsed);             Console.ReadLine();         }     } }

Edit

Replacing for (int i = 0; i < N; ++i) with foreach(int i in Enumerable.Range(0,N)) makes C# program to run in about 0.8s, but it's still much faster than f#

Edit

Replaced DateTime with StopWatch the for F#/C#. Results are the same

846

asked Feb 13 '16 09:02

user2136963

2 Answers

This is very definitely happening directly as a consequence of using the expression:

for i in seq{1..N} do

On my machine, this gives the result:

100000000

00:00:09.1500924

If I change the loop to:

for i in 1..N do

The result changes dramatically:

100000000

00:00:00.1001864

Why?

The IL generated by these two approaches is quite different. The second case, using the 1..N syntax simply gets compiled the same way as a C# for(int i=1; i<N+1; ++i) loop.

The first case is quite different, this version produces a full sequence which is then enumerated by a foreach loop.

The C# and F# versions making use of IEnumerables differ in that they use different range functions to generate them.

The C# version uses System.Linq.Enumerable.RangeIterator to generate the value range, while the F# version uses Microsoft.FSharp.Core.Operators.OperatorIntrinsics.RangeInt32. I think it's safe to assume that the performance difference we're seeing between the C# and F# versions in this particular case are a result of the performance characteristics of these two functions.

svick is correct to point out in his comment that the + operator is actually being passed as an argument to the integralRangeStep function.

For the non-trivial case where n <> m this results in the F# compiler using a ProperIntegralRangeEnumerator with the implementation found here: https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/FSharp.Core/prim-types.fs#L6463

let inline integralRangeStepEnumerator (zero,add,n,step,m,f) : IEnumerator<_> =     // Generates sequence z_i where z_i = f (n + i.step) while n + i.step is in region (n,m)     if n = m then         new SingletonEnumerator<_> (f n) |> enumerator      else         let up = (n < m)         let canStart = not (if up then step < zero else step > zero) // check for interval increasing, step decreasing          // generate proper increasing sequence         { new ProperIntegralRangeEnumerator<_,_>(n,m) with                  member x.CanStart = canStart                 member x.Before a b = if up then (a < b) else (a > b)                 member x.Equal a b = (a = b)                 member x.Step a = add a step                 member x.Result a = f a } |> enumerator

We can see that stepping through the Enumerator results in calls to the supplied add function rather than a more straightforward, direct addition.

Note: All timings run in Release mode (Tail Calls: On, Optimisation: On).

answered Sep 19 '22 01:09

TheInnerLight

I don't know F# very well so I wanted to take a peek at the code it produces. Here's the result. It just confirms TheInnerLight's answer.

First, C++ should be able to optimize your for loop away, you'll get a zero (or near-zero) time. The .NET compilers and JIT currently don't perform this optimization, so let's compare them.

Here's the IL of the C# loop:

// [21 28 - 21 58] IL_000e: ldc.i4.0      IL_000f: ldc.i4       100000000 IL_0014: call         class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [System.Core]System.Linq.Enumerable::Range(int32, int32) IL_0019: callvirt     instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0/*int32*/> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_001e: stloc.2      // V_2 .try {    IL_001f: br.s         IL_002c  // [21 16 - 21 24]   IL_0021: ldloc.2      // V_2   IL_0022: callvirt     instance !0/*int32*/ class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()   IL_0027: pop            // [22 9 - 22 15]   IL_0028: ldloc.0      // num1   IL_0029: ldc.i4.1        IL_002a: add             IL_002b: stloc.0      // num1    IL_002c: ldloc.2      // V_2   IL_002d: callvirt     instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()   IL_0032: brtrue.s     IL_0021   IL_0034: leave.s      IL_0040 } // end of .try finally {   IL_0036: ldloc.2      // V_2   IL_0037: brfalse.s    IL_003f   IL_0039: ldloc.2      // V_2   IL_003a: callvirt     instance void [mscorlib]System.IDisposable::Dispose()   IL_003f: endfinally    } // end of finally

And here's the IL of the F# loop:

// [23 5 - 23 138] IL_000f: ldc.i4.1      IL_0010: ldc.i4.1      IL_0011: ldc.i4       100000000 IL_0016: call         class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [FSharp.Core]Microsoft.FSharp.Core.Operators/OperatorIntrinsics::RangeInt32(int32, int32, int32) IL_001b: call         class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0/*int32*/> [FSharp.Core]Microsoft.FSharp.Core.Operators::CreateSequence<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0/*int32*/>) IL_0020: stloc.2      // V_2 IL_0021: ldloc.2      // V_2 IL_0022: callvirt     instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0/*int32*/> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator() IL_0027: stloc.3      // enumerator .try {  // [26 7 - 26 36]   IL_0028: ldloc.3      // enumerator   IL_0029: callvirt     instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()   IL_002e: brfalse.s    IL_003f  // [28 9 - 28 41]   IL_0030: ldloc.3      // enumerator   IL_0031: callvirt     instance !0/*int32*/ class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()   IL_0036: stloc.s      current  // [29 9 - 29 15]   IL_0038: ldloc.0      // func   IL_0039: ldc.i4.1        IL_003a: add             IL_003b: stloc.0      // func   IL_003c: nop              IL_003d: br.s         IL_0028   IL_003f: ldnull          IL_0040: stloc.s      V_4   IL_0042: leave.s      IL_005d } // end of .try finally {  // [34 7 - 34 57]   IL_0044: ldloc.3      // enumerator   IL_0045: isinst       [mscorlib]System.IDisposable   IL_004a: stloc.s      disposable  // [35 7 - 35 30]   IL_004c: ldloc.s      disposable   IL_004e: brfalse.s    IL_005a  // [36 9 - 36 29]   IL_0050: ldloc.s      disposable   IL_0052: callvirt     instance void [mscorlib]System.IDisposable::Dispose()    IL_0057: ldnull          IL_0058: pop             IL_0059: endfinally      IL_005a: ldnull          IL_005b: pop             IL_005c: endfinally    } // end of finally IL_005d: ldloc.s      V_4 IL_005f: pop

So, while the loops are a bit different, they mainly do the same thing.

Here's what C# does:

[0] Branch to the MoveNext part (just once)
[1] Get the Current property of the enumerable, and discard it
[2] Add 1 to the local 0
[3] Call MoveNext
[4] Go back to [1] on true, or exit the loop on false

The F# loop does the following:

[0] Call MoveNext
[1] Leave the loop on false
[2] Get the Current property of the enumerable, and store its value in a local
[3] Add 1 to the local 0
[4] Take a break with nop (sic)
[5] Branch to [0]

So we have two differences here:

C# discards the Current property's value while F# stores it in a local
F# has a nop (do nothing) instruction in the loop for some reason that's beyond me (and yes, this is Release mode).

But these differences alone don't explain the huge performance impact. Let's take a look at what the JIT does with this.

Note: rcx is the first argument in the x64 calling convention used, which corresponds to the this implicit parameter in instance method calls.

C#, x64:

            foreach (int i in Enumerable.Range(0, N)) 00007FFCF2B94514  xor         ecx,ecx   00007FFCF2B94516  mov         edx,5F5E100h   00007FFCF2B9451B  call        00007FFD50EF08F0          // Call Enumerable.Range 00007FFCF2B94520  mov         rcx,rax   00007FFCF2B94523  mov         r11,7FFCF2A80040h 00007FFCF2B9452D  cmp         dword ptr [rcx],ecx   00007FFCF2B9452F  call        qword ptr [r11]           // Call GetEnumerator 00007FFCF2B94532  mov         qword ptr [rbp-20h],rax   00007FFCF2B94536  mov         rcx,qword ptr [rbp-20h]   // Store the IEnumerator in rcx 00007FFCF2B9453A  mov         r11,7FFCF2A80048h         00007FFCF2B94544  cmp         dword ptr [rcx],ecx   00007FFCF2B94546  call        qword ptr [r11]           // Call MoveNext 00007FFCF2B94549  test        al,al   00007FFCF2B9454B  je          00007FFCF2B9457F          // Skip the loop 00007FFCF2B9454D  mov         rcx,qword ptr [rbp-20h]   // Store the IEnumerator in rcx 00007FFCF2B94551  mov         r11,7FFCF2A80050h   00007FFCF2B9455B  cmp         dword ptr [rcx],ecx   00007FFCF2B9455D  call        qword ptr [r11]           // Call get_Current             {                 x = x + 1; 00007FFCF2B94560  mov         ecx,dword ptr [rbp-0Ch]   00007FFCF2B94563  inc         ecx                        00007FFCF2B94565  mov         dword ptr [rbp-0Ch],ecx               foreach (int i in Enumerable.Range(0, N)) 00007FFCF2B94568  mov         rcx,qword ptr [rbp-20h]   // Store the IEnumerator in rcx 00007FFCF2B9456C  mov         r11,7FFCF2A80048h   00007FFCF2B94576  cmp         dword ptr [rcx],ecx   00007FFCF2B94578  call        qword ptr [r11]           // Call MoveNext 00007FFCF2B9457B  test        al,al   00007FFCF2B9457D  jne         00007FFCF2B9454D   00007FFCF2B9457F  mov         rcx,qword ptr [rsp+20h]   00007FFCF2B94584  call        00007FFCF2B945C6   00007FFCF2B94589  nop               }

F#, x64:

    for i in seq{1..N} do 00007FFCF2B904F4  mov         ecx,1   00007FFCF2B904F9  mov         edx,1   00007FFCF2B904FE  mov         r8d,5F5E100h   00007FFCF2B90504  call        00007FFD42AA2B80          // Create the sequence 00007FFCF2B90509  mov         rcx,rax   00007FFCF2B9050C  mov         r11,7FFCF2A90020h   00007FFCF2B90516  cmp         dword ptr [rcx],ecx   00007FFCF2B90518  call        qword ptr [r11]           // Call GetEnumerator 00007FFCF2B9051B  mov         qword ptr [rbp-20h],rax   00007FFCF2B9051F  mov         rcx,qword ptr [rbp-20h]   // Store the IEnumerator in rcx 00007FFCF2B90523  mov         r11,7FFCF2A90028h   00007FFCF2B9052D  cmp         dword ptr [rcx],ecx   00007FFCF2B9052F  call        qword ptr [r11]           // Call MoveNext   00007FFCF2B90532  test        al,al   00007FFCF2B90534  je          00007FFCF2B90553          // Exit the loop?         x <- (x+1) 00007FFCF2B90536  mov         rcx,qword ptr [rbp-20h]   00007FFCF2B9053A  mov         r11,7FFCF2A90030h   00007FFCF2B90544  cmp         dword ptr [rcx],ecx   00007FFCF2B90546  call        qword ptr [r11]           // Call get_Current 00007FFCF2B90549  mov         edx,dword ptr [rbp-0Ch]   00007FFCF2B9054C  inc         edx   00007FFCF2B9054E  mov         dword ptr [rbp-0Ch],edx   00007FFCF2B90551  jmp         00007FFCF2B9051F          // Loop 00007FFCF2B90553  mov         rcx,qword ptr [rsp+20h]   00007FFCF2B90558  call        00007FFCF2B9061C   00007FFCF2B9055D  nop

First, we notice that C# still calls Current even though it discards its result. This is a virtual call, which didn't get optimized away.

Oh and that F# nop IL opcode is optimized away by the JIT. There is a nop in the x64 code, but it's after the loop, and it's certainly here for purposes of alignment.

Then, we can see the code is very similar in the two cases, although it's structured a bit differently. It calls the same functions and doesn't do anything weird.

So yes, the performance difference you're seeing is certainly explained by the way F# constructs its sequence, not by its looping mechanism itself.

answered Sep 18 '22 01:09

Lucas Trzesniewski

Related questions
                            
                                What is the difference between Const and Static in C#?
                            
                                Convert to Stream from a Url
                            
                                smart way to generate unique random number
                            
                                HtmlAgility - Save parsing to a string
                            
                                WinForms Close() sets DialogResult to Cancel
                            
                                Reference to undeclared entity 'nbsp' while reading xml from URL in c#?
                            
                                How to get a random number from a range, excluding some values
                            
                                x:Static in UWP XAML
                            
                                C# readonly vs Get
                            
                                How to check if the mail has been sent successfully
                            
                                Equality between two enumerables
                            
                                Does a thread close automatically?
                            
                                log4net with ASP.NET MVC: nothing happens
                            
                                True Unsafe Code Performance
                            
                                Iterate through class fields and print them
                            
                                Server tags cannot contain <% ... %> constructs
                            
                                Clean the string? is there any better way of doing it?
                            
                                How to encrypt JWT security token?
                            
                                C# regex to get video id from youtube and vimeo by url
                            
                                C# creating instance of class and set properties by name in string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With