Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fold lists of lazy types ([IO a])?

I don't know whether it is a valid term 'lazy types'. But still, IO is lazy so in

import Control.Monad
import Data.List

result :: IO Double
result = foldl1' (liftM2 (+)) $ map return [1..10000000]

result' :: IO Double
result' = return $ foldl1' (+) [1..10000000]

result is slow and uses a lot of memory, unlike result'. How shall I fold [IO a] ?

like image 324
Yrogirg Avatar asked Jan 21 '12 15:01

Yrogirg


2 Answers

result constructs one big IO Double value without evaluating any of the intermediate results, that only happens when the total result is demanded, e.g. for printing. foldl' evaluates the intermediate results to weak head normal form, that is, to the outermost constructor or lambda. Since (in GHC), IO a has the constructor IO, the intermediate results of the fold have the form

IO (some computation combined with another)

and the expression under the IO gets more complicated at each step.

To avoid that, you have to force not only the intermediate IO values, but also the values that they return,

main :: IO ()
main = foldlM' (\a -> fmap (a+)) 0 (map return [1.0 .. 10000000]) >>= print

foldlM' :: Monad m => (a -> b -> m a) -> a -> [b] -> m a
foldlM' foo a [] = return a
foldlM' foo a (b:bs) = do
    c <- foo a b
    c `seq` foldlM' foo c bs

works for your example.

like image 71
Daniel Fischer Avatar answered Sep 28 '22 16:09

Daniel Fischer


The problem is that foldl' only reduces the accumulator to WHNF at each step. In this case the accumulator is an IO action, and evaluating an IO action does not force the value within, so you end up with the typical huge thunk that doesn't get evaluated until the end.

The solution is to use something stricter than liftM2, for example:

result'' :: IO Double
result'' = foldl1' f $ map return [1..100000]
    where f mx my = do x <- mx; y <- my; return $! x + y

Here's a quick benchmark:

import Control.Monad
import Data.List
import Criterion.Main

result :: IO Double
result = foldl1' (liftM2 (+)) $ map return [1..100000]

result' :: IO Double
result' = return $ foldl1' (+) [1..100000]

result'' :: IO Double
result'' = foldl1' f $ map return [1..100000]
  where f mx my = do x <- mx; y <- my; return $! x + y

main = defaultMain [ bench "result" $ whnfIO result
                   , bench "result'" $ whnfIO result'
                   , bench "result''" $ whnfIO result'' ]

Result:

[...]
benchmarking result
collecting 100 samples, 1 iterations each, in estimated 37.32438 s
mean: 136.3221 ms, lb 131.4504 ms, ub 140.8238 ms, ci 0.950
std dev: 23.92297 ms, lb 22.00429 ms, ub 25.53803 ms, ci 0.950

benchmarking result'
collecting 100 samples, 14 iterations each, in estimated 6.046951 s
mean: 4.349027 ms, lb 4.338121 ms, ub 4.367363 ms, ci 0.950
std dev: 70.96316 us, lb 49.01322 us, ub 113.0399 us, ci 0.950

benchmarking result''
collecting 100 samples, 2 iterations each, in estimated 8.131099 s
mean: 41.89589 ms, lb 40.67513 ms, ub 43.52798 ms, ci 0.950
std dev: 7.194770 ms, lb 5.758892 ms, ub 8.529327 ms, ci 0.950

As you can see, this is still slower than the pure code, but not by as much.

like image 44
hammar Avatar answered Sep 28 '22 17:09

hammar