Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stack overflow in OCaml and F# but not in Haskell

I've been comparing for fun different languages for speed in execution of the following program: for i from 1 to 1000000 sum the product i*(sqrt i)

One of my implementations (not the only one) is constructing a list [1..1000000] and then folding with a specific function.

The program works fine and fast in Haskell (even when using foldl and not foldl') but stack overflows in OCaml and F#.

Here is the Haskell code:

test = foldl (\ a b -> a + b * (sqrt b)) 0

create 0 = []
create n = n:(create (n-1))

main = print (test (create 1000000))

And here is the OCaml one:

let test = List.fold_left (fun a b -> a +. (float_of_int b) *. (sqrt (float_of_int b))) 0. ;;

let rec create = function
    | 0 -> []
    | n -> n::(create (n-1)) ;;

print_float (test (create 1000000));;

Why does the OCaml/F# implementation stack overflows?

like image 389
FDP Avatar asked Feb 20 '10 14:02

FDP


3 Answers

The Haskell code for create evaluates the list lazily, i.e. as elements are needed by foldl. The entire list is not needed all at once.

By contrast, F# and OCaml evaluate the create list strictly, i.e. the code tries to generate the entire list in one go before passing it to fold_left.

One possibility in F# is to use a seq expression in the create function: this generates the list lazily in the same way as Haskell. (I'm not sure if OCaml has an equivalent feature.)

like image 68
Tim Robinson Avatar answered Sep 17 '22 20:09

Tim Robinson


Firstly, if you're doing performance comparisons for numeric stuff, lists aren't the best choice. Try a package like the vector package for fast arrays.

And note that you can do even better in Haskell, thanks to loop fusion. By writing the create function as an enumeration the compiler can combine the create step, and the fold loop, into a single loop that allocates no intermediate data structures. The ability to do general fusion like this is unique to GHC Haskell.

I'll use the vector library (stream-based loop fusion):

import qualified Data.Vector as V

test = V.foldl (\ a b -> a + b * sqrt b) 0

create n = (V.enumFromTo 1 n)

main = print (test (create 1000000))

Now, before, with your code, the compiler is unable to remove all lists, and we end up with an inner loop like:

$wlgo :: Double# -> [Double] -> Double#

$wlgo =
  \ (ww_sww :: Double#) (w_swy :: [Double]) ->
    case w_swy of _ {
      [] -> ww_sww;
      : x_aoY xs_aoZ ->
        case x_aoY of _ { D# x1_aql ->
        $wlgo
          (+##
             ww_sww (*## x1_aql (sqrtDouble# x1_aql)))
          xs_aoZ
        }
    }

$wcreate :: Double# -> [Double]

$wcreate =
  \ (ww_swp :: Double#) ->
    case ==## ww_swp 0.0 of _ {
      False ->
        :
          @ Double
          (D# ww_swp)
          ($wcreate (-## ww_swp 1.0));
      True -> [] @ Double
    }

Note how there are two loops: create generating a (lazy) list, and the fold consuming it. Thanks to laziness, the cost of that list is cheap, so it runs in a respectable:

$ time ./C
4.000004999999896e14
./C  0.06s user 0.00s system 98% cpu 0.058 total

Under fusion, however, we get instead a single loop only!

main_$s$wfoldlM_loop :: Double# -> Double# -> Double#

main_$s$wfoldlM_loop =
  \ (sc_sYc :: Double#) (sc1_sYd :: Double#) ->
    case <=## sc_sYc 1000000.5 of _ {
      False -> sc1_sYd;
      True ->
        main_$s$wfoldlM_loop
          (+## sc_sYc 1.0)
          (+##
             sc1_sYd (*## sc_sYc (sqrtDouble# sc_sYc)))

GHC reduced our create and test steps into a single loop with no lists used. Just 2 doubles in registers. And with half as many loops, it runs nearly twice as fast:

$ ghc D.hs -Odph -fvia-C -optc-O3 -optc-march=native -fexcess-precision --make
$ time ./D
4.000005000001039e14
./D  0.04s user 0.00s system 95% cpu 0.038 total

This is a nice example of the power that guarantees of purity provide -- the compiler can be very aggressive at reording your code.

like image 44
Don Stewart Avatar answered Sep 17 '22 20:09

Don Stewart


Another way to fix the code so that it works in F# is to use sequence expressions that are also generated lazily and in a way that doesn't cause StackOverflow (this won't work in OCaml, because sequence expressions are F# specific). Here is the code:

let test nums = Seq.fold (fun a b -> 
  a + (float b) * (sqrt (float b))) 0.0 nums

let rec create count = seq {
  match count with
  | 0 -> do ()
  | n -> yield n
         yield! create (n-1) }

printf "%f" (test (create 1000000));; 

I'm not sure about the performance, however the compiler definitely does an optimization in the create function, so that iterating over the generated sequence should be reasonably fast. However, the aim of sequence expressions is mainly readability (since F# is not pure, it gives you a lot of space for manual optimizations if you really need them as opposed to Haskell which needs to optimize the pure code you write). Anyway, if you have some tests, you can give it a try!

like image 44
Tomas Petricek Avatar answered Sep 18 '22 20:09

Tomas Petricek