One of my struggles with lazy evaluation in Haskell is the difficulty of reasoning about memory usage. I think the ability to duplicate a thunk would make this much easier for me. Here's an example.
Let's create a really big list:
let xs = [1..10000000]
Now, let's create a bad function:
bad = do
print $ foldl1' (+) xs
print $ length xs
With no optimizations, this eats up a few dozen MB of ram. The garbage collector can't deallocate xs during the fold because it will be needed for calculating the length later.
Is it possible to reimplement this function something like this:
good = do
(xs1,xs2) <- copyThunk xs
print $ foldl1' (+) xs1
print $ length xs2
Now, xs1 and xs2 would represent the same value, but also be independent of each other in memory so the garbage collector can deallocate during the fold preventing memory wasting. (I think this would slightly increase the computational cost though?)
Obviously in this trivial example, refactoring the code could easily solve this problem, but It seems like it's not always obvious how to refactor. Or sometimes refactoring would greatly reduce code clarity.
I was wondering the same thing a while ago and created a prototypical implementation of such a thunk-duplication function. You can read about the result in my preprint „dup – Explicit un-sharing in haskell” and see the code at http://darcs.nomeata.de/ghc-dup. Unfortunately, the paper was neither accepted for the Haskell Symposium nor the Haskell Implementors Workshop this year.
To my knowledge, there is no real-world-ready solution to the problem; only fragile work-arounds as the unit parameter trick that might break due to one or the other compiler optimizations.
Interesting question. I don't know how to implement copyThunk
. But there is something else you can do (sorry if you already knew this):
xsFunction :: () -> [Int]
xsFunction = const [1..10000000]
better = do
print $ foldl1' (+) $ xsFunction ()
print $ length $ xsFunction ()
Here it definitely won't put the expression xsFunction ()
in a thunk, it will be calculated twice thus not making any memory bloat.
An interesting follow up on this is:
copyThunk
?Turn xs
into a function. This may be ugly, but works, because it prevents sharing:
let xs () = [1..1000000]
good = do
print $ foldl1' (+) (xs ())
print $ length (xs ())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With