Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this F# sequence function not tail recursive?

Disclosure: this came up in FsCheck, an F# random testing framework I maintain. I have a solution, but I do not like it. Moreover, I do not understand the problem - it was merely circumvented.

A fairly standard implementation of (monadic, if we're going to use big words) sequence is:

let sequence l = 
    let k m m' = gen { let! x = m
                       let! xs = m'
                       return (x::xs) }
    List.foldBack k l (gen { return [] })

Where gen can be replaced by a computation builder of choice. Unfortunately, that implementation consumes stack space, and so eventually stack overflows if the list is long enough.The question is: why? I know in principle foldBack is not tail recursive, but the clever bunnies of the F# team have circumvented that in the foldBack implementation. Is there a problem in the computation builder implementation?

If I change the implementation to the below, everything is fine:

let sequence l =
    let rec go gs acc size r0 = 
        match gs with
        | [] -> List.rev acc
        | (Gen g)::gs' ->
            let r1,r2 = split r0
            let y = g size r1
            go gs' (y::acc) size r2
    Gen(fun n r -> go l [] n r)

For completeness, the Gen type and computation builder can be found in the FsCheck source

like image 248
Kurt Schelfthout Avatar asked May 30 '11 20:05

Kurt Schelfthout


People also ask

Why do S's look like F's?

Why in old English text was an 's' written as an 'f'? It wasn't; it was just written differently according to its position in the word. The f-like s (like an f without the crossbar) was a tall variant used at the start or in the middle of a word, which the modern s was used at the end or after a tall s.

What do you call the F symbol?

The letter F with a hook (majuscule Ƒ, minuscule: ƒ) is a letter of the Latin script, based on the italic form of f; or on its regular form with a descender hook added.

How do you type F with a hook?

The Alt Code for ƒ is Alt 402. If you have a keyboard with a numeric pad, you can use this method. Simply hold down the Alt Key and type 402.

What is the function symbol?

A function, its domain, and its codomain, are declared by the notation f: X→Y, and the value of a function f at an element x of X, denoted by f(x), is called the image of x under f, or the value of f applied to the argument x.


2 Answers

Building on Tomas's answer, let's define two modules:

module Kurt = 
    type Gen<'a> = Gen of (int -> 'a)

    let unit x = Gen (fun _ -> x)

    let bind k (Gen m) =     
        Gen (fun n ->       
            let (Gen m') = k (m n)       
            m' n)

    type GenBuilder() =
        member x.Return(v) = unit v
        member x.Bind(v,f) = bind f v

    let gen = GenBuilder()


module Tomas =
    type Gen<'a> = Gen of (int -> ('a -> unit) -> unit)

    let unit x = Gen (fun _ f -> f x)

    let bind k (Gen m) =     
        Gen (fun n f ->       
            m n (fun r ->         
                let (Gen m') = k r        
                m' n f))

    type GenBuilder() =
        member x.Return v = unit v
        member x.Bind(v,f) = bind f v

    let gen = GenBuilder()

To simplify things a bit, let's rewrite your original sequence function as

let rec sequence = function
| [] -> gen { return [] }
| m::ms -> gen {
    let! x = m
    let! xs = sequence ms
    return x::xs }

Now, sequence [for i in 1 .. 100000 -> unit i] will run to completion regardless of whether sequence is defined in terms of Kurt.gen or Tomas.gen. The issue is not that sequence causes a stack overflow when using your definitions, it's that the function returned from the call to sequence causes a stack overflow when it is called.

To see why this is so, let's expand the definition of sequence in terms of the underlying monadic operations:

let rec sequence = function
| [] -> unit []
| m::ms ->
    bind (fun x -> bind (fun xs -> unit (x::xs)) (sequence ms)) m

Inlining the Kurt.unit and Kurt.bind values and simplifying like crazy, we get

let rec sequence = function
| [] -> Kurt.Gen(fun _ -> [])
| (Kurt.Gen m)::ms ->
    Kurt.Gen(fun n ->
            let (Kurt.Gen ms') = sequence ms
            (m n)::(ms' n))

Now it's hopefully clear why calling let (Kurt.Gen f) = sequence [for i in 1 .. 1000000 -> unit i] in f 0 overflows the stack: f requires a non-tail-recursive call to sequence and evaluation of the resulting function, so there will be one stack frame for each recursive call.

Inlining Tomas.unit and Tomas.bind into the definition of sequence instead, we get the following simplified version:

let rec sequence = function
| [] -> Tomas.Gen (fun _ f -> f [])
| (Tomas.Gen m)::ms ->
    Tomas.Gen(fun n f ->  
        m n (fun r ->
            let (Tomas.Gen ms') = sequence ms
            ms' n (fun rs ->  f (r::rs))))

Reasoning about this variant is tricky. You can empirically verify that it won't blow the stack for some arbitrarily large inputs (as Tomas shows in his answer), and you can step through the evaluation to convince yourself of this fact. However, the stack consumption depends on the Gen instances in the list that's passed in, and it is possible to blow the stack for inputs that aren't themselves tail recursive:

// ok
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> unit i]
f 0 (fun list -> printfn "%i" list.Length)

// not ok...
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> Gen(fun _ f -> f i; printfn "%i" i)]
f 0 (fun list -> printfn "%i" list.Length)
like image 90
kvb Avatar answered Oct 28 '22 02:10

kvb


You're correct - the reason why you're getting a stack overflow is that the bind operation of the monad needs to be tail-recursive (because it is used to aggregate values during folding).

The monad used in FsCheck is essentially a state monad (it keeps the current generator and some number). I simplified it a bit and got something like:

type Gen<'a> = Gen of (int -> 'a)

let unit x = Gen (fun n -> x)

let bind k (Gen m) = 
    Gen (fun n -> 
      let (Gen m') = k (m n) 
      m' n)

Here, the bind function is not tail-recursive because it calls k and then does some more work. You can change the monad to be a continuation monad. It is implemented as a function that takes the state and a continuation - a function that is called with the result as an argument. For this monad, you can make bind tail recursive:

type Gen<'a> = Gen of (int -> ('a -> unit) -> unit)

let unit x = Gen (fun n f -> f x)

let bind k (Gen m) = 
    Gen (fun n f -> 
      m n (fun r -> 
        let (Gen m') = k r
        m' n f))

The following example will not stack overflow (and it did with the original implementation):

let sequence l = 
  let k m m' = 
    m |> bind (fun x ->
      m' |> bind (fun xs -> 
        unit (x::xs)))
  List.foldBack k l (unit [])

let (Gen f) = sequence [ for i in 1 .. 100000 -> unit i ]
f 0 (fun list -> printfn "%d" list.Length)
like image 35
Tomas Petricek Avatar answered Oct 28 '22 00:10

Tomas Petricek