Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why sequence expressions for arrays are so slow in F#?

Tags:

.net

f#

The code:

#time "on"

let newVector = [| 
  for v in 1..10000000 ->
    v |] 

let newVector2 = 
  let a = Array.zeroCreate 10000000
  for v in 1..10000000 do
    a.[v-1] <- v
  a

let newVector3 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in 1..10000000 do
    a.Add(v)
  a.ToArray()

gives timing in FSI:

--> Timing now on

> 
Real: 00:00:01.121, CPU: 00:00:01.156, GC gen0: 4, gen1: 4, gen2: 4

val newVector : int [] =  [|1; 2; 3; 4; ...|]

> 
Real: 00:00:00.024, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0

val newVector2 : int32 [] =  [|1; 2; 3; 4; ...|]

> 
Real: 00:00:00.173, CPU: 00:00:00.156, GC gen0: 2, gen1: 2, gen2: 2

val newVector3 : int32 [] =  [|1; 2; 3; 4;  ...|]

Not a big difference in a standalone app, release mode, without debug, 5 runs average:

  • Sequence expression: 425
  • Preallocate: 13
  • List: 80

The third method doesn't know the original capacity but is still almost 7x faster. Why sequence expressions for arrays are so slow in F#?

Update:

let seq = seq { for v in 1..10000000 do yield v }
let seqArr = seq |> Seq.toArray
Real: 00:00:01.060, CPU: 00:00:01.078, GC gen0: 2, gen1: 2, gen2: 2

let newVector4 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in seq do
    a.Add(v)
  a.ToArray()
Real: 00:00:01.119, CPU: 00:00:01.109, GC gen0: 1, gen1: 1, gen2: 1

open System.Linq
let newVector5 =  seq.ToArray()
Real: 00:00:00.969, CPU: 00:00:00.968, GC gen0: 0, gen1: 0, gen2: 0

this gives the same timing as the first one and doesn't depend on GC. So the real question is, why enumerating 1..10000000 is so much slower that for loop in the second and the third case?

Update 2

open System
open System.Linq
open System.Collections.Generic
let newVector5 =  seq.ToArray()

let ie count = 
  { new IEnumerable<int> with
      member x.GetEnumerator(): Collections.IEnumerator = x.GetEnumerator() :> Collections.IEnumerator

      member x.GetEnumerator(): IEnumerator<int> = 
        let c = ref 0
        { new IEnumerator<int> with
            member y.MoveNext() = 
              if !c < count then
                c := !c + 1
                true
              else false

            member y.Current with get() = !c + 1
            member y.Current with get() = !c + 1 :> obj
            member y.Dispose() = () 
            member y.Reset() = ()       
        }
  }


let newVector6 = 
  let a = System.Collections.Generic.List() // do not set capacity
  for v in ie 10000000 do
    a.Add(v)
  a.ToArray()
Real: 00:00:00.185, CPU: 00:00:00.187, GC gen0: 1, gen1: 1, gen2: 1

Manual implementation of IEnumerable is equivalent to for loop. I wonder why expansion of lo..hi should be so much slower for a generic case. It could be implemented by method overloads at least for the most common types.

like image 439
V.B. Avatar asked Jun 05 '15 17:06

V.B.


2 Answers

In cases like this I always check the generated code using one of the many good decompilers in .NET.

let explicitArray () = 
  let a = Array.zeroCreate count
  for v in 1..count do
    a.[v-1] <- v
  a

This is compiled into the equivalent C#:

public static int[] explicitArray()
{
    int[] a = ArrayModule.ZeroCreate<int>(10000000);
    for (int v = 1; v < 10000001; v++)
    {
        a[v - 1] = v;
    }
    return a;
}

This is about as efficient it gets.

let arrayExpression () = 
  [|  for v in 1..count -> v |] 

This on the other hand becomes:

public static int[] arrayExpression()
{
    return SeqModule.ToArray<int>(new Program.arrayExpression@7(0, null, 0, 0));
}

This equivalent to:

let arrayExpression () = 
  let e = seq { for v in 1..count -> v }
  let a = List() // do not set capacity
  for v in e do
    a.Add(v)
  a.ToArray()

When iterating over a seq (alias for IEnumerable) first MoveNext is called, then Current. These are virtual calls which the JIT:er seldomly can inline. Checking the JIT:ed assembly code we see this:

mov         rax,qword ptr [rbp+10h]  
cmp         byte ptr [rax],0  
mov         rcx,qword ptr [rbp+10h]  
lea         r11,[7FFC07830030h]  
# virtual call .MoveNext
call        qword ptr [7FFC07830030h]  
movzx       ecx,al  
# if .MoveNext returns false then exit
test        ecx,ecx  
je          00007FFC079408A0  
mov         rcx,qword ptr [rbp+10h]  
lea         r11,[7FFC07830038h]  
# virtual call .Current
call        qword ptr [7FFC07830038h]  
mov         edx,eax  
mov         rcx,rdi  
# call .Add
call        00007FFC65C8B300  
# loop
jmp         00007FFC07940863  

If we compare this with the JIT:ed code for the code that used ResizeArray (List)

lea         edx,[rdi-1]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
mov         edx,edi  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
lea         edx,[rdi+1]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
lea         edx,[rdi+2]  
mov         rcx,rbx  
# call .Add
call        00007FFC65C8B300  
add         edi,4  
# loop
cmp         edi,989682h  
jl          00007FFC07910384  

Here the JIT:er has unrolled the loop 4 times here and we have only non-virtual calls to List.Add.

This explains why F# array expressions are slower than your other two examples.

In order to fix this one would have to fix the optimizer in F# to recognize the shape of expressions such as this:

seq { for v in 1..count -> v } |> Seq.toArray

And optimize them into:

let a = Array.zeroCreate count
for v in 1..count do
    a.[v-1] <- v
a

The challenge is to find an optimization that is general enough to be useful but also doesn't break the semantics of F#.

like image 160
Just another metaprogrammer Avatar answered Nov 09 '22 05:11

Just another metaprogrammer


For grins, I dumped your array comprehension into ILDASM to get a look at what happens. I put the let into main and got this:

.locals init ([0] int32[] newVector,
       [1] class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<string[],class [FSharp.Core]Microsoft.FSharp.Core.Unit> V_1,
       [2] string[] V_2)
IL_0000:  nop
IL_0001:  ldc.i4.0
IL_0002:  ldnull
IL_0003:  ldc.i4.0
IL_0004:  ldc.i4.0
IL_0005:  newobj     instance void Program/newVector@11::.ctor(int32,
                                                             class [mscorlib]System.Collections.Generic.IEnumerator`1<int32>,
                                                             int32,
                                                             int32)
IL_000a:  call       !!0[] [FSharp.Core]Microsoft.FSharp.Collections.SeqModule::ToArray<int32>(class [mscorlib]System.Collections.Generic.IEnumerable`1<!!0>)
IL_000f:  stloc.0

So an instance of newVector@11 gets created and that class inherits from GeneratedSequenceBase, which itself, implements IENumerable. This makes sense because next there is a call to Seq.ToArray. Looking at that class, there is no way to determine whether or not the length of the sequence is known, even though it is knowable in this case, because of the nature of IEnumerable. This tells me that it should be equivalent to:

let seqArr = seq { for v in 1..10000000 do yield v } |> Seq.toArray

and the timing bears this out:

Real: 00:00:01.032, CPU: 00:00:01.060, GC gen0: 4, gen1: 4, gen2: 4

for a final bit of fun, I put the above sequence comprehension through ILDASM and the wrapper class and GenerateNext method within it are instruction for instruction identical to the array comprehension. So I think that it is very safe to conclude that any array comprehension of the form:

let arr = [| sequence-expr |]

is 100% equivalent to:

let arr = seq { sequence-expr } |> Seq.toArray

And this is hinted at in the F# array documentation with the following line:

You can also use sequence expressions to create arrays. Following is an example that creates an array of squares of integers from 1 to 10.

which is really saying "it's a sequence, not its own thing."

like image 25
plinth Avatar answered Nov 09 '22 05:11

plinth