How can I fold a lazy list using a monadic action in constant space? The problem I'm trying to solve is aggregating a large file, and I believe for the sake of performance I require mutability. I have an implementation working in ST using mutable vectors, but it uses too much memory. Below is an example of what I'm attempting. I also experimented briefly with Conduit but that didn't appear to provide any improvement.
ST forM_:
import Control.Monad (forM_)
import Control.Monad.ST.Trans as STT
import Control.Monad.Identity as Identity
testST :: Int
testST = do
Identity.runIdentity $ STT.runST $ do
a <- STT.newSTRef 0
forM_ [1..10000000] (\x -> do
a' <- STT.readSTRef a
STT.writeSTRef a (a' + x)
)
STT.readSTRef a
Conduit:
import Data.Conduit (($=),(=$),($$))
import qualified Data.Conduit as C
import qualified Data.Conduit.List as CL
testCL :: IO Int
testCL = CL.sourceList [1..10000000] $$ CL.foldM (\a x -> return (a + x)) 0
The problem is not with the fold, but with the fold body. This program allocates a lot:
testST = runST $ do
ref <- newSTRef 0
forM_ [1..10000000] $ \x -> do
val <- readSTRef ref
writeSTRef ref (val + x)
readSTRef ref
This program, whose only difference is on the writeSTRef line, allocates almost nothing:
testST = runST $ do
ref <- newSTRef 0
forM_ [1..10000000] $ \x -> do
val <- readSTRef ref
writeSTRef ref $! val + x
readSTRef ref
The difference between the two pieces of code is a good hint to what's going on: in the former, you are creating a reference to a deeply-nested thunk with 10000000 layers of applications of +; whereas the latter flattens the thunk at each step.
By the way, this common pitfall is explicitly called out in the documentation for modifySTRef.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With