Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a thunk be duplicated to improve memory performance?

One of my struggles with lazy evaluation in Haskell is the difficulty of reasoning about memory usage. I think the ability to duplicate a thunk would make this much easier for me. Here's an example.

Let's create a really big list:

let xs = [1..10000000]

Now, let's create a bad function:

bad = do
    print $ foldl1' (+) xs
    print $ length xs

With no optimizations, this eats up a few dozen MB of ram. The garbage collector can't deallocate xs during the fold because it will be needed for calculating the length later.

Is it possible to reimplement this function something like this:

good = do
    (xs1,xs2) <- copyThunk xs
    print $ foldl1' (+) xs1
    print $ length xs2

Now, xs1 and xs2 would represent the same value, but also be independent of each other in memory so the garbage collector can deallocate during the fold preventing memory wasting. (I think this would slightly increase the computational cost though?)

Obviously in this trivial example, refactoring the code could easily solve this problem, but It seems like it's not always obvious how to refactor. Or sometimes refactoring would greatly reduce code clarity.

like image 624
Mike Izbicki Avatar asked Jul 26 '12 18:07

Mike Izbicki


3 Answers

I was wondering the same thing a while ago and created a prototypical implementation of such a thunk-duplication function. You can read about the result in my preprint „dup – Explicit un-sharing in haskell” and see the code at http://darcs.nomeata.de/ghc-dup. Unfortunately, the paper was neither accepted for the Haskell Symposium nor the Haskell Implementors Workshop this year.

To my knowledge, there is no real-world-ready solution to the problem; only fragile work-arounds as the unit parameter trick that might break due to one or the other compiler optimizations.

like image 153
Joachim Breitner Avatar answered Nov 13 '22 00:11

Joachim Breitner


Interesting question. I don't know how to implement copyThunk. But there is something else you can do (sorry if you already knew this):

xsFunction :: () -> [Int]
xsFunction = const [1..10000000]

better = do
  print $ foldl1' (+) $ xsFunction ()
  print $ length $ xsFunction ()

Here it definitely won't put the expression xsFunction () in a thunk, it will be calculated twice thus not making any memory bloat.


An interesting follow up on this is:

  • Can one ever implement copyThunk?
  • Should a haskell programmer ever be messing around with this relatively low level optimizations? Can't we assume ghc to outsmart us on this?
like image 24
Tarrasch Avatar answered Nov 12 '22 22:11

Tarrasch


Turn xs into a function. This may be ugly, but works, because it prevents sharing:

let xs () = [1..1000000]

good = do
    print $ foldl1' (+) (xs ())
    print $ length (xs ())
like image 35
ertes Avatar answered Nov 13 '22 00:11

ertes