F# "for loop" optimization

Question

Code example:

let foo1 (arr : int[]) = 
    for i = 0 to arr.Length-1 do
        arr.[i] <- i

let foo2 (arr : int[]) = 
    for i in [0..arr.Length-1] do
        arr.[i] <- i

I thought that this functions should be equivalent to each other (in terms of performance). But if we look into IL listing, we'll see:

First function, 15 lines, no dynamic allocations, no try operator, no virtual calling:

IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: stloc.0
IL_0003: br.s IL_0011
// loop start (head: IL_0011)
    IL_0005: ldarg.0
    IL_0006: ldloc.0
    IL_0007: ldloc.0
    IL_0008: stelem.any [mscorlib]System.Int32
    IL_000d: ldloc.0
    IL_000e: ldc.i4.1
    IL_000f: add
    IL_0010: stloc.0

    IL_0011: ldloc.0
    IL_0012: ldarg.0
    IL_0013: ldlen
    IL_0014: conv.i4
    IL_0015: blt.s IL_0005
// end loop

IL_0017: ret

and second one - almost 100 lines, lots of allocations/deallocations, callings of virtual functions, lots of try/Dispose:

IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: ldc.i4.1
IL_0003: ldarg.0
IL_0004: ldlen
IL_0005: conv.i4
IL_0006: ldc.i4.1
IL_0007: sub
IL_0008: call class [mscorlib]System.Collections.Generic.IEnumerable`1<int32> [FSharp.Core]Microsoft.FSharp.Core.Operators/OperatorIntrinsics::RangeInt32(int32, int32, int32)
IL_000d: call class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0> [FSharp.Core]Microsoft.FSharp.Core.Operators::CreateSequence<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_0012: call class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Collections.SeqModule::ToList<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_0017: stloc.0
IL_0018: ldloc.0
IL_0019: unbox.any class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>
IL_001e: callvirt instance class [mscorlib]System.Collections.Generic.IEnumerator`1<!0> class [mscorlib]System.Collections.Generic.IEnumerable`1<int32>::GetEnumerator()
IL_0023: stloc.1
.try
{
    // loop start (head: IL_0024)
        IL_0024: ldloc.1
        IL_0025: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
        IL_002a: brfalse.s IL_003e

        IL_002c: ldloc.1
        IL_002d: callvirt instance !0 class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>::get_Current()
        IL_0032: stloc.3
        IL_0033: ldarg.0
        IL_0034: ldloc.3
        IL_0035: ldloc.3
        IL_0036: stelem.any [mscorlib]System.Int32
        IL_003b: nop
        IL_003c: br.s IL_0024
    // end loop

    IL_003e: ldnull
    IL_003f: stloc.2
    IL_0040: leave.s IL_005b
} // end .try
finally
{
    IL_0042: ldloc.1
    IL_0043: isinst [mscorlib]System.IDisposable
    IL_0048: stloc.s 4
    IL_004a: ldloc.s 4
    IL_004c: brfalse.s IL_0058

    IL_004e: ldloc.s 4
    IL_0050: callvirt instance void [mscorlib]System.IDisposable::Dispose()
    IL_0055: ldnull
    IL_0056: pop
    IL_0057: endfinally

    IL_0058: ldnull
    IL_0059: pop
    IL_005a: endfinally
} // end handler

IL_005b: ldloc.2
IL_005c: pop
IL_005d: ret

My question is why does F# compiler uses so complicated code for foo2? Why does it use an IEnumerable to implement so trivial loop?

pad · Accepted Answer

In the 2nd example if you use range expression, it will be converted into normal for loop:

let foo2 (arr : int[]) = 
    for i in 0..arr.Length-1 do
        arr.[i] <- i

and become equivalent to foo1.

I quote Section 6.3.12 Range Expressions in F# language specs:

A sequence iteration expression of the form for var in expr1 .. expr2 do expr3 done is sometimes elaborated as a simple for loop-expression (§6.5.7).

However, your 2nd example is more like:

let foo2 (arr : int[]) = 
    let xs = [0..arr.Length-1] (* A new list is created *)
    for i in xs do
        arr.[i] <- i

where you have created a new list explicitly.

JaredPar · Answer

What your seeing is the standard difference between using an index based enumeration and an IEnumerable based enumeration (or in C# terms for vs foreach).

In the second sample the expression [0..arr.Length-1] is creating a collection and F# is using IEnumerable<T> to enumerate the values. Part of this enumeration style involves the use of IEnumerator<T> which implements IDisposable. The try / finally block you are seeing is generated to ensure that the IDisposable::Dispose method is called at the end of the enumeration even in the face of an exception.

Could F# optimize the second example into the first and avoid all of the extra overhead? It's possible that they could do this optimization. Essentially peek through the range expression, not it's just a simple numeric range and hence generate the equivalent for code.

Should F# optimize the second example. My vote would be no. Features like this often look trivial from the outside but actually implementing them, and more importantly maintaining them, can be rather expensive. An astute user could always convert their code back to the standard for version and avoid the IEnumerable<T> overhead (should the profiler reveal it to be an issue). Not implementing the optimization frees up the F# team to implement other awesome features.

F# "for loop" optimization

Tags:

optimization

for-loop

f#

qehgt

2 Answers

pad

JaredPar

Recent Activity

Donate For Us

F# "for loop" optimization

Tags:

optimization

for-loop

f#

qehgt

2 Answers

pad

JaredPar

Related questions

Recent Activity

Donate For Us