I've been comparing for fun different languages for speed in execution of the following program: for i from 1 to 1000000 sum the product i*(sqrt i)
One of my implementations (not the only one) is constructing a list [1..1000000] and then folding with a specific function.
The program works fine and fast in Haskell (even when using foldl and not foldl') but stack overflows in OCaml and F#.
Here is the Haskell code:
test = foldl (\ a b -> a + b * (sqrt b)) 0
create 0 = []
create n = n:(create (n-1))
main = print (test (create 1000000))
And here is the OCaml one:
let test = List.fold_left (fun a b -> a +. (float_of_int b) *. (sqrt (float_of_int b))) 0. ;;
let rec create = function
| 0 -> []
| n -> n::(create (n-1)) ;;
print_float (test (create 1000000));;
Why does the OCaml/F# implementation stack overflows?
The Haskell code for create
evaluates the list lazily, i.e. as elements are needed by foldl
. The entire list is not needed all at once.
By contrast, F# and OCaml evaluate the create
list strictly, i.e. the code tries to generate the entire list in one go before passing it to fold_left
.
One possibility in F# is to use a seq
expression in the create
function: this generates the list lazily in the same way as Haskell. (I'm not sure if OCaml has an equivalent feature.)
Firstly, if you're doing performance comparisons for numeric stuff, lists aren't the best choice. Try a package like the vector package for fast arrays.
And note that you can do even better in Haskell, thanks to loop fusion. By writing the create function as an enumeration the compiler can combine the create step, and the fold loop, into a single loop that allocates no intermediate data structures. The ability to do general fusion like this is unique to GHC Haskell.
I'll use the vector library (stream-based loop fusion):
import qualified Data.Vector as V
test = V.foldl (\ a b -> a + b * sqrt b) 0
create n = (V.enumFromTo 1 n)
main = print (test (create 1000000))
Now, before, with your code, the compiler is unable to remove all lists, and we end up with an inner loop like:
$wlgo :: Double# -> [Double] -> Double#
$wlgo =
\ (ww_sww :: Double#) (w_swy :: [Double]) ->
case w_swy of _ {
[] -> ww_sww;
: x_aoY xs_aoZ ->
case x_aoY of _ { D# x1_aql ->
$wlgo
(+##
ww_sww (*## x1_aql (sqrtDouble# x1_aql)))
xs_aoZ
}
}
$wcreate :: Double# -> [Double]
$wcreate =
\ (ww_swp :: Double#) ->
case ==## ww_swp 0.0 of _ {
False ->
:
@ Double
(D# ww_swp)
($wcreate (-## ww_swp 1.0));
True -> [] @ Double
}
Note how there are two loops: create generating a (lazy) list, and the fold consuming it. Thanks to laziness, the cost of that list is cheap, so it runs in a respectable:
$ time ./C
4.000004999999896e14
./C 0.06s user 0.00s system 98% cpu 0.058 total
Under fusion, however, we get instead a single loop only!
main_$s$wfoldlM_loop :: Double# -> Double# -> Double#
main_$s$wfoldlM_loop =
\ (sc_sYc :: Double#) (sc1_sYd :: Double#) ->
case <=## sc_sYc 1000000.5 of _ {
False -> sc1_sYd;
True ->
main_$s$wfoldlM_loop
(+## sc_sYc 1.0)
(+##
sc1_sYd (*## sc_sYc (sqrtDouble# sc_sYc)))
GHC reduced our create and test steps into a single loop with no lists used. Just 2 doubles in registers. And with half as many loops, it runs nearly twice as fast:
$ ghc D.hs -Odph -fvia-C -optc-O3 -optc-march=native -fexcess-precision --make
$ time ./D
4.000005000001039e14
./D 0.04s user 0.00s system 95% cpu 0.038 total
This is a nice example of the power that guarantees of purity provide -- the compiler can be very aggressive at reording your code.
Another way to fix the code so that it works in F# is to use sequence expressions that are also generated lazily and in a way that doesn't cause StackOverflow
(this won't work in OCaml, because sequence expressions are F# specific). Here is the code:
let test nums = Seq.fold (fun a b ->
a + (float b) * (sqrt (float b))) 0.0 nums
let rec create count = seq {
match count with
| 0 -> do ()
| n -> yield n
yield! create (n-1) }
printf "%f" (test (create 1000000));;
I'm not sure about the performance, however the compiler definitely does an optimization in the create
function, so that iterating over the generated sequence should be reasonably fast. However, the aim of sequence expressions is mainly readability (since F# is not pure, it gives you a lot of space for manual optimizations if you really need them as opposed to Haskell which needs to optimize the pure code you write). Anyway, if you have some tests, you can give it a try!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With