Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding stack overflow (with F# infinite sequences of sequences)

I have this "learning code" I wrote for the morris seq in f# that suffers from stack overflow that I don't know how to avoid. "morris" returns an infinite sequence of "see and say" sequences (i.e., {{1}, {1,1}, {2,1}, {1,2,1,1}, {1,1,1,2,2,1}, {3,1,2,2,1,1},...}).

    let printList l =         Seq.iter (fun n -> printf "%i" n) l         printfn ""      let rec morris s =          let next str = seq {             let cnt = ref 1  // Stack overflow is below when enumerating             for cur in [|0|] |> Seq.append str |> Seq.windowed 2 do                 if cur.[0] <> cur.[1] then                     yield!( [!cnt ; cur.[0]] )                     cnt := 0                 incr cnt         }         seq {         yield s         yield! morris (next s) // tail recursion, no stack overflow         }      // "main"     // Print the nth iteration     let _ =  [1] |> morris |> Seq.nth 3125 |> printList  

You can pick off the nth iteration using Seq.nth but you can only get so far before you hit a stack overflow. The one bit of recursion I have is tail recursion and it in essence builds a linked set of enumerators. That's not where the problem is. It's when "enum" is called on the say the 4000th sequence. Note that's with F# 1.9.6.16, the previous version topped out above 14000). It's because the way the linked sequences are resolved. The sequences are lazy and so the "recursion" is lazy. That is, seq n calls seq n-1 which calls seq n-2 and so forth to get the first item (the very first # is the worst case).

I understand that [|0|] |> Seq.append str |> Seq.windowed 2, is making my problem worse and I could triple the # I could generate if I eliminated that. Practically speaking the code works well enough. The 3125th iteration of morris would be over 10^359 characters in length.

The problem I'm really trying to solve is how to retain the lazy eval and have a no limit based on stack size for the iteration I can pick off. I'm looking for the proper F# idiom to make the limit based on memory size.

Update Oct '10

After learning F# a bit better, a tiny bit of Haskell, thinking & investigating this problem for over year, I finally can answer my own question. But as always with difficult problems, the problem starts with it being the wrong question. The problem isn't sequences of sequences - it's really because of a recursively defined sequence. My functional programming skills are a little better now and so it's easier to see what's going on with the version below, which still gets a stackoverflow

let next str =      Seq.append str [0]     |> Seq.pairwise     |> Seq.scan (fun (n,_) (c,v) ->             if (c = v) then (n+1,Seq.empty)             else (1,Seq.ofList [n;c]) ) (1,Seq.empty)     |> Seq.collect snd  let morris = Seq.unfold(fun sq -> Some(sq,next sq)) 

That basicially creates a really long chain of Seq processing function calls to generate the sequnces. The Seq module that comes with F# is what can't follow the chain without using the stack. There's an optimization it uses for append and recursively defined sequences, but that optimization only works if the recursion is implementing an append.

So this will work

let rec ints n = seq { yield n; yield! ints (n+1) } printf "%A" (ints 0 |> Seq.nth 100000);; 

And this one will get a stackoverflow.

let rec ints n = seq { yield n; yield! (ints (n+1)|> Seq.map id) } printf "%A" (ints 0 |> Seq.nth 100000);; 

To prove the F# libary was the issue, I wrote my own Seq module that implemented append, pairwise, scan and collect using continutions and now I can begin generating and printing out the 50,000 seq without a problem (it'll never finish since it's over 10^5697 digits long).

Some additional notes:

  • Continuations were the idiom I was looking for, but in this case, they had to go into the F# library, not my code. I learned about continuations in F# from Tomas Petricek's Real-World Functional Programming book.
  • The lazy list answer that I accepted held the other idiom; lazy evaluation. In my rewritten library, I also had to leverage the lazy type to avoid stackoverflow.
  • The lazy list version sorta of works by luck (maybe by design but that's beyond my current ability to determine) - the active-pattern matching it uses while it's constructing and iterating causes the lists to calculate values before the required recursion gets too deep, so it's lazy, but not so lazy it needs continuations to avoid stackoverflow. For example, by the time the 2nd sequence needs a digit from the 1st sequence, it's already been calculated. In other words, the LL version is not strictly JIT lazy for sequence generation, only list management.
like image 715
Tony Lee Avatar asked May 22 '09 17:05

Tony Lee


People also ask

How would you avoid a stack overflow?

Avoid or strictly limit recursion. Don't break your programs up too far into smaller and smaller functions - even without counting local variables each function call consumes as much as 64 bytes on the stack (32 bit processor, saving half the CPU registers, flags, etc)

Does scanf prevent buffer overflow?

The 'scanf()' functions can lead to buffer overflow if used improperly. They do not have bound checking capability and if the input string is longer than the buffer size, then the characters will overflow into the adjoining memory. It is possible to avoid buffer overflow by specifying a field width.

Is scanf safe in C?

There is a major security flaw in scanf family( scanf , sscanf , fscanf ..etc) esp when reading a string, because they don't take the length of the buffer (into which they are reading) into account. Example: char buf[3]; sscanf("abcdef","%s",buf); clearly the the buffer buf can hold MAX 3 char.


1 Answers

You should definitely check out

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.PowerPack/Microsoft.FSharp.Collections.LazyList.html

but I will try to post a more comprehensive answer later.

UPDATE

Ok, a solution is below. It represents the Morris sequence as a LazyList of LazyLists of int, since I presume you want it to be lazy in 'both directions'.

The F# LazyList (in the FSharp.PowerPack.dll) has three useful properties:

  • it is lazy (evaluation of the nth element will not happen until it is first demanded)
  • it does not recompute (re-evaluation of the nth element on the same object instance will not recompute it - it caches each element after it's first computed)
  • you can 'forget' prefixes (as you 'tail' into the list, the no-longer-referenced prefix is available for garbage collection)

The first property is common with seq (IEnumerable), but the other two are unique to LazyList and very useful for computational problems such as the one posed in this question.

Without further ado, the code:

// print a lazy list up to some max depth let rec PrintList n ll =     match n with     | 0 -> printfn ""     | _ -> match ll with            | LazyList.Nil -> printfn ""            | LazyList.Cons(x,xs) ->                printf "%d" x                PrintList (n-1) xs  // NextMorris : LazyList<int> -> LazyList<int> let rec NextMorris (LazyList.Cons(cur,rest)) =      let count = ref 1     let ll = ref rest     while LazyList.nonempty !ll && (LazyList.hd !ll) = cur do         ll := LazyList.tl !ll         incr count     LazyList.cons !count         (LazyList.consf cur (fun() ->             if LazyList.nonempty !ll then                 NextMorris !ll             else                 LazyList.empty()))  // Morris : LazyList<int> -> LazyList<LazyList<int>> let Morris s =     let rec MakeMorris ll =         LazyList.consf ll (fun () ->             let next = NextMorris ll             MakeMorris next         )     MakeMorris s  // "main" // Print the nth iteration, up to a certain depth [1] |> LazyList.of_list |> Morris |> Seq.nth 3125 |> PrintList 10 [1] |> LazyList.of_list |> Morris |> Seq.nth 3126 |> PrintList 10 [1] |> LazyList.of_list |> Morris |> Seq.nth 100000 |> PrintList 35 [1] |> LazyList.of_list |> Morris |> Seq.nth 100001 |> PrintList 35 

UPDATE2

If you just want to count, that's fine too:

let LLLength ll =     let rec Loop ll acc =         match ll with         | LazyList.Cons(_,rest) -> Loop rest (acc+1N)         | _ -> acc     Loop ll 0N  let Main() =     // don't do line below, it leaks     //let hundredth = [1] |> LazyList.of_list |> Morris |> Seq.nth 100     // if we only want to count length, make sure we throw away the only     // copy as we traverse it to count     [1] |> LazyList.of_list |> Morris |> Seq.nth 100         |> LLLength |> printfn "%A"  Main()     

The memory usage stays flat (under 16M on my box)... hasn't finished running yet, but I computed the 55th length fast, even on my slow box, so I think this should work just fine. Note also that I used 'bignum's for the length, since I think this will overflow an 'int'.

like image 75
Brian Avatar answered Sep 19 '22 08:09

Brian