Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seq.zip in F# may require seemingly extra sequence element to complete

Tags:

haskell

f#

Let's Seq.zip two F# sequences, one represented by a list and the other - by Seq.filter applied to an infinite sequence:

Seq.initInfinite (fun i -> i)
|> Seq.filter ((>) 3)
|> Seq.zip ["A";"B"]

returns as expected

val it : seq<string * int> = seq [("A", 0); ("B", 1)]

However,

Seq.initInfinite (fun i -> i)
|> Seq.filter ((>) 2)
|> Seq.zip ["A";"B"]

hangs on trying to get non-existing 3rd member that can pass Seq.filter and eventually blows up fsi:

 Error: Enumeration based on System.Int32 exceeded System.Int32.MaxValue.

although the other argument represented by list of letters hints that just two filtered elements would be enough to perform zip to the function specification.

If we turn to Haskell for implementation comparison the equivalent

 zip ["A","B"] (filter (<2) [0..])

completes without any problem yielding

[("A",0),("B",1)]

As Haskell implementation behavior seem intuitively right what is the justification for the observed behavior of F# Seq.zip implementation?

UPDATE:

I did not notice that Haskell

zip (filter (<2) [0..]) ["A","B"]

does not complete similarly to F#.

Bottom line: It is impossible to implement Zip function capable of zipping sequences of defined and undefined lengths in argument order-agnostic manner. F# Zip implementation simply prefers invariant to argument order behavior over Haskell argument order dependent one.

like image 279
Gene Belitski Avatar asked Feb 02 '23 14:02

Gene Belitski


2 Answers

The reason why it doesn't hang in Haskell is because the implementation of zip just so happens to be stricter in it's first argument than in it's second.

zip :: [a] -> [b] -> [(a,b)]
zip (a:as) (b:bs) = (a,b) : zip as bs
zip _      _      = []

Since the patterns are checked from left to right, this gives the following behavior.

*Main> zip [] undefined
[]
*Main> zip undefined []
*** Exception: Prelude.undefined

Since filter (<2) [0..] is semantically equivalent to 0 : 1 : ⊥, your example is after two iterations

("A", 0) : ("B", 1) : zip [] undefined =
("A", 0) : ("B", 1) : []

If we change the order of the arguments to zip (filter (<2) [0..]) ["A", "B"], we get

(0, "A") : (1, "B") : zip undefined [] =
(0, "A") : (1, "B") : undefined

I don't know much about F#, but I suspect something similar is happening there.

Note that there is no way to define zip such that zip [] undefined and zip undefined [] both return [], since you have to check one of the arguments first, and it's impossible to check against ⊥ due to monotonicity.

like image 110
hammar Avatar answered Feb 16 '23 04:02

hammar


I don't know how Haskell does it, and I do agree that it seems intuitively correct (except I wonder what would happen in Haskell if you switched the fixed length list and the indeterminate length list), but I can show you why it works this way in F#. You can see in the F# source code file seq.fs that the significant implementation detail is in IEnumerable.map2:

  let map2 f (e1 : IEnumerator<_>) (e2 : IEnumerator<_>) : IEnumerator<_>=
      upcast 
          {  new MapEnumerator<_>() with
                 member this.DoMoveNext curr = 
                    let n1 = e1.MoveNext()
                    let n2 = e2.MoveNext()
                    if n1 && n2 then
                       curr <- f e1.Current e2.Current
                       true
                    else 
                       false
                 member this.Dispose() = e1.Dispose(); e2.Dispose()
          }

So Seq.zip will try to move both sequences to their third element before deciding whether the zip is complete, thus Seq.initInfinite (fun i -> i) |> Seq.filter ((>) 2) gets stuck trying to find the third element "forever" (until Error: Enumeration based on System.Int32 exceeded System.Int32.MaxValue).

like image 30
Stephen Swensen Avatar answered Feb 16 '23 02:02

Stephen Swensen