Code example:
let foo1 (arr : int[]) =
for i = 0 to arr.Length-1 do
arr.[i] <- i
let foo2 (arr : int[]) =
for i in [0..arr.Length-1] do
arr.[i] <- i
I thought that this functions should be equivalent to each other (in terms of performance). But if we look into IL listing, we'll see:
First function, 15 lines, no dynamic allocations, no try
operator, no virtual calling:
IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: stloc.0
IL_0003: br.s IL_0011
// loop start (head: IL_0011)
IL_0005: ldarg.0
IL_0006: ldloc.0
IL_0007: ldloc.0
IL_0008: stelem.any [mscorlib]System.Int32
IL_000d: ldloc.0
IL_000e: ldc.i4.1
IL_000f: add
IL_0010: stloc.0
IL_0011: ldloc.0
IL_0012: ldarg.0
IL_0013: ldlen
IL_0014: conv.i4
IL_0015: blt.s IL_0005
// end loop
IL_0017: ret
and second one - almost 100 lines, lots of allocations/deallocations, callings of virtual functions, lots of try
/Dispose
:
IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: ldc.i4.1
IL_0003: ldarg.0
IL_0004: ldlen
IL_0005: conv.i4
IL_0006: ldc.i4.1
IL_0007: sub
IL_0008: call class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [FSharp.Core]Microsoft.FSharp.Core.Operators/OperatorIntrinsics::RangeInt32(int32, int32, int32)
IL_000d: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0> [FSharp.Core]Microsoft.FSharp.Core.Operators::CreateSequence<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_0012: call class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Collections.SeqModule::ToList<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_0017: stloc.0
IL_0018: ldloc.0
IL_0019: unbox.any class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>
IL_001e: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator()
IL_0023: stloc.1
.try
{
// loop start (head: IL_0024)
IL_0024: ldloc.1
IL_0025: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
IL_002a: brfalse.s IL_003e
IL_002c: ldloc.1
IL_002d: callvirt instance !0 class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()
IL_0032: stloc.3
IL_0033: ldarg.0
IL_0034: ldloc.3
IL_0035: ldloc.3
IL_0036: stelem.any [mscorlib]System.Int32
IL_003b: nop
IL_003c: br.s IL_0024
// end loop
IL_003e: ldnull
IL_003f: stloc.2
IL_0040: leave.s IL_005b
} // end .try
finally
{
IL_0042: ldloc.1
IL_0043: isinst [mscorlib]System.IDisposable
IL_0048: stloc.s 4
IL_004a: ldloc.s 4
IL_004c: brfalse.s IL_0058
IL_004e: ldloc.s 4
IL_0050: callvirt instance void [mscorlib]System.IDisposable::Dispose()
IL_0055: ldnull
IL_0056: pop
IL_0057: endfinally
IL_0058: ldnull
IL_0059: pop
IL_005a: endfinally
} // end handler
IL_005b: ldloc.2
IL_005c: pop
IL_005d: ret
My question is why does F# compiler uses so complicated code for foo2
? Why does it use an IEnumerable
to implement so trivial loop?
In the 2nd example if you use range expression, it will be converted into normal for
loop:
let foo2 (arr : int[]) =
for i in 0..arr.Length-1 do
arr.[i] <- i
and become equivalent to foo1
.
I quote Section 6.3.12 Range Expressions in F# language specs:
A sequence iteration expression of the form for var in expr1 .. expr2 do expr3 done is sometimes elaborated as a simple for loop-expression (§6.5.7).
However, your 2nd example is more like:
let foo2 (arr : int[]) =
let xs = [0..arr.Length-1] (* A new list is created *)
for i in xs do
arr.[i] <- i
where you have created a new list explicitly.
What your seeing is the standard difference between using an index based enumeration and an IEnumerable
based enumeration (or in C# terms for
vs foreach
).
In the second sample the expression [0..arr.Length-1]
is creating a collection and F# is using IEnumerable<T>
to enumerate the values. Part of this enumeration style involves the use of IEnumerator<T>
which implements IDisposable
. The try / finally
block you are seeing is generated to ensure that the IDisposable::Dispose
method is called at the end of the enumeration even in the face of an exception.
Could F# optimize the second example into the first and avoid all of the extra overhead? It's possible that they could do this optimization. Essentially peek through the range expression, not it's just a simple numeric range and hence generate the equivalent for
code.
Should F# optimize the second example. My vote would be no. Features like this often look trivial from the outside but actually implementing them, and more importantly maintaining them, can be rather expensive. An astute user could always convert their code back to the standard for
version and avoid the IEnumerable<T>
overhead (should the profiler reveal it to be an issue). Not implementing the optimization frees up the F# team to implement other awesome features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With