Can a thunk be duplicated to improve memory performance?

Question

One of my struggles with lazy evaluation in Haskell is the difficulty of reasoning about memory usage. I think the ability to duplicate a thunk would make this much easier for me. Here's an example.

Let's create a really big list:

let xs = [1..10000000]

Now, let's create a bad function:

bad = do
    print $ foldl1' (+) xs
    print $ length xs

With no optimizations, this eats up a few dozen MB of ram. The garbage collector can't deallocate xs during the fold because it will be needed for calculating the length later.

Is it possible to reimplement this function something like this:

good = do
    (xs1,xs2) <- copyThunk xs
    print $ foldl1' (+) xs1
    print $ length xs2

Now, xs1 and xs2 would represent the same value, but also be independent of each other in memory so the garbage collector can deallocate during the fold preventing memory wasting. (I think this would slightly increase the computational cost though?)

Obviously in this trivial example, refactoring the code could easily solve this problem, but It seems like it's not always obvious how to refactor. Or sometimes refactoring would greatly reduce code clarity.

Joachim Breitner · Accepted Answer

I was wondering the same thing a while ago and created a prototypical implementation of such a thunk-duplication function. You can read about the result in my preprint „dup – Explicit un-sharing in haskell” and see the code at http://darcs.nomeata.de/ghc-dup. Unfortunately, the paper was neither accepted for the Haskell Symposium nor the Haskell Implementors Workshop this year.

To my knowledge, there is no real-world-ready solution to the problem; only fragile work-arounds as the unit parameter trick that might break due to one or the other compiler optimizations.

Tarrasch · Answer

Interesting question. I don't know how to implement copyThunk. But there is something else you can do (sorry if you already knew this):

xsFunction :: () -> [Int]
xsFunction = const [1..10000000]

better = do
  print $ foldl1' (+) $ xsFunction ()
  print $ length $ xsFunction ()

Here it definitely won't put the expression xsFunction () in a thunk, it will be calculated twice thus not making any memory bloat.

An interesting follow up on this is:

Can one ever implement copyThunk?
Should a haskell programmer ever be messing around with this relatively low level optimizations? Can't we assume ghc to outsmart us on this?

ertes · Answer

Turn xs into a function. This may be ugly, but works, because it prevents sharing:

let xs () = [1..1000000]

good = do
    print $ foldl1' (+) (xs ())
    print $ length (xs ())

Can a thunk be duplicated to improve memory performance?

Tags:

haskell

lazy-evaluation

Mike Izbicki

3 Answers

Joachim Breitner

Tarrasch

ertes

Recent Activity

Donate For Us

Can a thunk be duplicated to improve memory performance?

Tags:

haskell

lazy-evaluation

Mike Izbicki

3 Answers

Joachim Breitner

Tarrasch

ertes

Related questions

Recent Activity

Donate For Us