Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Haskell evaluate and not garbage collect random indexes in a list?

As I understand it, Haskell only garbage collects when something goes out of scope, so a top level binding will only be evaluated once and never go out of scope. So if I run this code in GHCI, the first 50 elements will be evaluated and saved.

let xs = map f [0..]
take 50 xs

My questions is what happens when I execute the following snippet: xs !! 99. What does the garbage collector save? Does it

  1. Keep results for indexes 0 - 49, thunk for indexes 50 - 98, result for index 99, thunk for indexes 100+
  2. Keep results for indexes 0 - 49, thunk for indexes 50+
  3. Keep results for indexes 0 - 99, thunk for indexes 100+
like image 746
Ramith Jayatilleka Avatar asked Sep 09 '14 18:09

Ramith Jayatilleka


2 Answers

Haskell lists are linked lists made up of (:) ("cons") cells and terminated by a [] ("nil") value. I'll draw such cells like this

 [x] -> (tail - remainder of list)
  |
  v
(head - a value)

So when thinking about what is evaluated, there are two pieces to consider. First is the spine, that is the structure of cons cells, and second is the values the list contains. Instead of 50 and 99, let's use 2 and 4, respectively.

ghci> take 2 xs
[0,1]

Printing this list forces the evaluation of the first two cons cells as well as the values within them. So your list looks like this:

[x] -> [x] -> (thunk)
 |      |
 v      v
 0      1

Now, when we

ghci> xs !! 4
3

we haven't demanded the 2nd or 3rd values, but we need to evaluate those cons cells to get to the 4th element. So we have forced the spine all the way up to the 4th element, but we have only evaluated the 4th value, so the list now looks like this:

[x] -> [x] -> [x] -> [x] -> [x] -> (thunk)
 |      |      |      |      |
 v      v      v      v      v
 0      1   (thunk) (thunk)  4

Nothing in this picture will be garbage-collected. However, sometimes those thunks can take up a lot of space or reference something large, and evaluating them to a plain value will allow those resources to be freed. See this answer for a small discussion of those subtleties.

like image 176
luqui Avatar answered Nov 08 '22 21:11

luqui


Let's ask the profiler.

We will compile the following example program that should do approximately what your GHCI session did. It's important that we print the results, like GHCI did, as this forces the computations.

f x = (-x)

xs = map f [0..]

main = do
    print (take 50 xs)
    print (xs !! 99)

I saved mine as example.hs. We'll compile it with options to enable profiling

ghc -prof -fprof-auto -rtsopts example.hs

Time profile

We can find out how many times f was applied with a time profile.

profile +RTS -p

This produces an output file named example.prof, the following is the interesting portion:

COST CENTRE MODULE                     no.     entries
...
   f        Main                        78          51 

We can see that f was evaluated 51 times, 50 times for the print (take 50 xs) and once for print (xs !! 99). Therefore we can rule out your third possibility, since f was only evaluated 51 times, there can't be results for all of the indices 0-99

  1. Keep results for indexes 0 - 99, thunk for indexes 100+

Heap profile of results

Profiling the memory on the heap is a bit trickier. The heap profiler takes samples, by default once every .1 seconds. Our program will run so fast that the heap profiler won't take any samples while it's running. We'll add a spin to our program so that the heap profiler will get a chance to take a sample. The following will spin for a number of seconds.

import Data.Time.Clock

spin :: Real a => a -> IO ()
spin seconds =
    do
        startTime <- getCurrentTime 
        let endTime = addUTCTime (fromRational (toRational seconds)) startTime
        let go = do
            now <- getCurrentTime
            if now < endTime then go else return ()
        go

We don't want the garbage collector to collect the data while the program is running, so we'll add another usage of xs after the spin.

main = do
    print (take 50 xs)
    print (xs !! 99)
    spin 1
    print (xs !! 0)

We'll run this with the default heap profiling option, which groups memory usage by cost center.

example +RTS -h

This produces the file example.hp. We'll pull a sample out of the middle of the file where the numbers are stable (while it was in a spin).

BEGIN_SAMPLE 0.57
(42)PINNED  32720
(48)Data.Time.Clock.POSIX.CAF   48
(85)spin.endTime/spin/mai...    56
(84)spin.go/spin/main/Mai...    64
(81)xs/Main.CAF 4848
(82)f/xs/Main.CAF   816
(80)main/Main.CAF   160
(64)GHC.IO.Encoding.CAF 72
(68)GHC.IO.Encoding.CodeP...    136
(57)GHC.IO.Handle.FD.CAF    712
(47)Main.CAF    96
END_SAMPLE 0.57

We can see that f has produced 816 bytes of memory. For "small" Integers, an Integer consumes 2 word of memory. On my system, a word is 8 bytes of memory, so a "small" Integer takes 16 bytes. Therefore 816/16 = 51 of the Integers produced by f are probably still in memory.

We can check that all of this memory is actually being used for "small" Integers by asking for a profile by closure description with -hd. We can't both group memory usage by closure descripiton and cost-centre, but we can restrict the profiling to a single cost-centre with -hc, in this case, we are interested in the f cost-centre

example +RTS -hd -hcf

This says that all 816 bytes due to f are being used by S#, the constructor for "small" Integers

BEGIN_SAMPLE 0.57
S#  816
END_SAMPLE 0.57

We can certainly remove the following, since 51 Integer results are being kept, and it expects to only keep 50 Integers

  1. Keep results for indexes 0 - 49, thunk for indexes 50+

Heap profile of structure and thunks

This leaves us with option

  1. Keep results for indexes 0 - 49, thunks for indexes 50 - 98, result for index 99, thunk for indexes 100+

Let's guess how much memory this situation would be consuming.

In general, Haskell data types require 1 word of memory for the constructor, and 1 word for each field. The [] type's : constructor has two fields, so it should take 3 words of memory, or 24 bytes. 100 :s would then take 2400 bytes of memory. We will see that this is exactly correct when we ask about the closure description for xs.

It's difficult to reason about the size of thunks, but we'll give it a try. There would be 49 thunks for the values of indices [50, 98]. Each of these thunks is probably holding the Integer from the generator [0..]. It also needs to hold the structure of a thunk, which unfortunately changes when profiling. There will also be a thunk for the remainder of a the list. It will need the Integer from which the remainder of the list is generated, and the structure of a thunk.

Asking for a memory breakdown by closure description for the xs cost-centre

example +RTS -hd -hcxs

Gives us the following sample

BEGIN_SAMPLE 0.60
<base:GHC.Enum.sat_s34b>    32
<base:GHC.Base.sat_s1be>    32
<main:Main.sat_s1w0>    16
S#  800
<base:GHC.Base.sat_s1bd>    1568
:   2400
END_SAMPLE 0.60

We were exactly correct about there being 100 :s requiring 2400 bytes of memory. There are 49+1 = 50 "small" Integers S# occupying 800 bytes for the thunks for the 49 uncalculated values and the thunk for the remainder of the lists. There are 1568 bytes which are probably the 49 thunks for the uncalculated values, which would each then be 32 bytes or 4 words. There are another 80 bytes we can't exactly explain that are left over for the thunk for the remainder of the list.

Both memory and time profiles are consistent with our belief that the program will

  1. Keep results for indexes 0 - 49, thunks for indexes 50 - 98, result for index 99, thunk for indexes 100+
like image 29
Cirdec Avatar answered Nov 08 '22 19:11

Cirdec