Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An implementation problem of F# Seq

I am digging into F# source code recently.

in Seq.fs:

// Binding. 
//
// We use a type defintion to apply a local dynamic optimization. 
// We automatically right-associate binding, i.e. push the continuations to the right.
// That is, bindG (bindG G1 cont1) cont2 --> bindG G1 (cont1 o cont2)
// This makes constructs such as the following linear rather than quadratic:
//
//  let rec rwalk n = { if n > 0 then 
//                         yield! rwalk (n-1)
//                         yield n }

After seeing the above code, I tested two code:

let rec rwalk n = seq { if n > 0 then 
                         yield n
                         yield! rwalk (n-1)
                      }

and

let rec rwalk n = seq { if n > 0 then 
                         yield! rwalk (n-1)
                         yield n 
                      }

I found the first one is very fast, while the second is very slow. If n = 10000, it costs 3 seconds on my machine to generate this sequence, thus quadratic time.

The quadratic time is reasonable, as e.g.

seq { yield! {1; 2; ...; n-1}; yield n } translates to

Seq.append {1; 2; ...; n-1} {n}

This append operation should take linear time, I guess. While in the first code, the append operation is like this: seq { yield n; yield! {n-1; n-2; ...; 1} }, which costs constant time.

The the comments in code say that it is linear (maybe this linear is not linear time). Maybe this linear relates to using customized implementation for sequence rather than Moand/F# computation expression (as mentioned in F# specification, however the specification does not mention the reason for doing so...).

Could anyone clarify the fuzziness here? Thanks a lot!

(p.s. because this is a language design and optimization problem, I also attached Haskell tag to see if people there have insights. )

like image 670
Yin Zhu Avatar asked Nov 20 '10 09:11

Yin Zhu


1 Answers

When yield! appears in a non-tail-call position, it essentiall means the same thing as:

for v in <expr> do yield v

The problem with this (and the reason why is that quadratic) is that for recursive calls, this creates a chain of iterators with nested for loops. You need to iterate over the whole sequence generated by <expr> for every single element, so if the iteration is linear, you get a quadratic time (because the linear iteration happens for every element).

Let's say the rwalk function generates [ 9; 2; 3; 7 ]. In the first iteration, the recursively generated sequence has 4 elements, so you'd iterate over 4 elements and add 1. In the recursive call, you'd iterate over 3 elements and add 1, etc.. Using a diagram, you can see how that's quadratic:

x
x x 
x x x
x x x x

Also, each of the recursive calls creates a new instance of object (IEnumerator) so there is also some memory cost (although only linear).

In a tail-call position, the F# compiler/librar does an optimization. It "replaces" the current IEnumerable with the one returned by the recursive call, so it doesn't need to iterate overe it to generate all elements - it is simply returned (and this also removes the memory cost).

Related. The same problem has been discussed in the C# lanaugage design and there is an interesting paper about it (their name for yield! is yield foreach).

like image 182
Tomas Petricek Avatar answered Nov 07 '22 15:11

Tomas Petricek