Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is the space leak when using scanl? (Haskell)

Consider this piece of code:

incState state i = map (+ i) state

main =
  sequence_ $ do
    n <- map last $ scanl incState (replicate 1000000 0) (repeat 1)
    return $ print $ n

When I run it, memory usage keeps increasing without bounds, as it prints out numbers. Where is the memory leak here? It should only be keeping around a constant amount of memory, previous values of state should be discarded whenever it prints something.

Even if sequence_ or the list monad is holding onto values of n, that is only one integer per iteration. I'm seeing 6Gs of memory being used after it only counts up to 100. A million integers should only be taking up 10Ms at most.

like image 383
yong Avatar asked Mar 18 '23 05:03

yong


1 Answers

Nothing is forcing the intermediate calculations through the middle of the list. If you replace last with sum it stops leaking memory.

To see what's going on, let's consider a slightly simpler statement of the same thing:

main = mapM (print . last) $ scanl incState (replicate 5 0) (repeat 1)

Now, let's start evaluating it. To begin with, the memory for our state just looks like

scanl incState (map (+1) (replicate 5 0)) (repeat 1)

We'll skip some of the details, but the first demand we have is that mapM pattern matches against the start of the list of states.

(:) -> replicate 5 0 
 |     ^
 |     |_________________                     
 v                       |
scanl incState (map (+1) | ) (repeat 1)

To get the last item to print, we force the spine of the list and the last item, but not the individual items. We'll see this better in the next step.

       0 : 0 : 0 : 0 : 0 : [] 
       ^
       |_________________                     
                         |
scanl incState (map (+1) | ) (repeat 1)

For the next step of mapM, we need the next item from the list of states. The memory now looks something like

       0 : 0 : 0 : 0 : 0 : [] 
       ^
       |________                    
                |
(:) -> map (+1) |
 |     ^
 |     |_________________                     
 v                       |
scanl incState (map (+1) | ) (repeat 1)

To get the last item to print, we force the spine of the list and the last item, but not the other individual items.

       0   : 0   : 0   : 0   : 0 : [] 
       ^     ^     ^     ^
       |     |     |     |     
       |+1 : |+1 : |+1 : |+1 : 1 : []
       ^
       |_________________                     
                         |
scanl incState (map (+1) | ) (repeat 1)

For the next step of mapM, we need the next item from the list of states. The memory now looks something like

       0   : 0   : 0   : 0   : 0 : [] 
       ^     ^     ^     ^
       |     |     |     |     
       |+1 : |+1 : |+1 : |+1 : 1 : []
       ^
       |________                    
                |
(:) -> map (+1) |
 |     ^
 |     |_________________                     
 v                       |
scanl incState (map (+1) | ) (repeat 1)

To get the last item to print, we force the spine of the list and the last item, but not the other individual items.

       0   : 0   : 0   : 0   : 0 : [] 
       ^     ^     ^     ^
       |     |     |     |     
       |+1 : |+1 : |+1 : |+1 : 1 : []
       ^     ^     ^     ^
       |     |     |     |     
       |+1 : |+1 : |+1 : |+1 : 2 : []
       ^
       |_________________                     
                         |
scanl incState (map (+1) | ) (repeat 1)

Each time we repeat this, we leave behind another list of thunks, referring to the previous list of thunks, etc. That's where the space leak is coming from. If you force all of the results after each step, like print . sum would do, there will not be a space leak.

like image 176
Cirdec Avatar answered Mar 28 '23 19:03

Cirdec