How can I fold a lazy list using a monadic action in constant space? The problem I'm trying to solve is aggregating a large file, and I believe for the sake of performance I require mutability. I have an implementation working in ST using mutable vectors, but it uses too much memory. Below is an example of what I'm attempting. I also experimented briefly with Conduit but that didn't appear to provide any improvement.
ST forM_:
import Control.Monad (forM_)
import Control.Monad.ST.Trans as STT
import Control.Monad.Identity as Identity
testST :: Int
testST = do
Identity.runIdentity $ STT.runST $ do
a <- STT.newSTRef 0
forM_ [1..10000000] (\x -> do
a' <- STT.readSTRef a
STT.writeSTRef a (a' + x)
)
STT.readSTRef a
Conduit:
import Data.Conduit (($=),(=$),($$))
import qualified Data.Conduit as C
import qualified Data.Conduit.List as CL
testCL :: IO Int
testCL = CL.sourceList [1..10000000] $$ CL.foldM (\a x -> return (a + x)) 0
The problem is not with the fold, but with the fold body. This program allocates a lot:
testST = runST $ do
ref <- newSTRef 0
forM_ [1..10000000] $ \x -> do
val <- readSTRef ref
writeSTRef ref (val + x)
readSTRef ref
This program, whose only difference is on the writeSTRef
line, allocates almost nothing:
testST = runST $ do
ref <- newSTRef 0
forM_ [1..10000000] $ \x -> do
val <- readSTRef ref
writeSTRef ref $! val + x
readSTRef ref
The difference between the two pieces of code is a good hint to what's going on: in the former, you are creating a reference to a deeply-nested thunk with 10000000 layers of applications of +
; whereas the latter flattens the thunk at each step.
By the way, this common pitfall is explicitly called out in the documentation for modifySTRef
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With