As I understand it, Haskell only garbage collects when something goes out of scope, so a top level binding will only be evaluated once and never go out of scope. So if I run this code in GHCI, the first 50 elements will be evaluated and saved.
let xs = map f [0..]
take 50 xs
My questions is what happens when I execute the following snippet: xs !! 99
. What does the garbage collector save? Does it
Haskell lists are linked lists made up of (:)
("cons") cells and terminated by a []
("nil") value. I'll draw such cells like this
[x] -> (tail - remainder of list)
|
v
(head - a value)
So when thinking about what is evaluated, there are two pieces to consider. First is the spine, that is the structure of cons cells, and second is the values the list contains. Instead of 50 and 99, let's use 2 and 4, respectively.
ghci> take 2 xs
[0,1]
Printing this list forces the evaluation of the first two cons cells as well as the values within them. So your list looks like this:
[x] -> [x] -> (thunk)
| |
v v
0 1
Now, when we
ghci> xs !! 4
3
we haven't demanded the 2nd or 3rd values, but we need to evaluate those cons cells to get to the 4th element. So we have forced the spine all the way up to the 4th element, but we have only evaluated the 4th value, so the list now looks like this:
[x] -> [x] -> [x] -> [x] -> [x] -> (thunk)
| | | | |
v v v v v
0 1 (thunk) (thunk) 4
Nothing in this picture will be garbage-collected. However, sometimes those thunks can take up a lot of space or reference something large, and evaluating them to a plain value will allow those resources to be freed. See this answer for a small discussion of those subtleties.
Let's ask the profiler.
We will compile the following example program that should do approximately what your GHCI session did. It's important that we print
the results, like GHCI did, as this forces the computations.
f x = (-x)
xs = map f [0..]
main = do
print (take 50 xs)
print (xs !! 99)
I saved mine as example.hs
. We'll compile it with options to enable profiling
ghc -prof -fprof-auto -rtsopts example.hs
We can find out how many times f
was applied with a time profile.
profile +RTS -p
This produces an output file named example.prof
, the following is the interesting portion:
COST CENTRE MODULE no. entries
...
f Main 78 51
We can see that f
was evaluated 51 times, 50 times for the print (take 50 xs)
and once for print (xs !! 99)
. Therefore we can rule out your third possibility, since f was only evaluated 51 times, there can't be results for all of the indices 0-99
- Keep results for indexes 0 - 99, thunk for indexes 100+
Profiling the memory on the heap is a bit trickier. The heap profiler takes samples, by default once every .1 seconds. Our program will run so fast that the heap profiler won't take any samples while it's running. We'll add a spin to our program so that the heap profiler will get a chance to take a sample. The following will spin for a number of seconds.
import Data.Time.Clock
spin :: Real a => a -> IO ()
spin seconds =
do
startTime <- getCurrentTime
let endTime = addUTCTime (fromRational (toRational seconds)) startTime
let go = do
now <- getCurrentTime
if now < endTime then go else return ()
go
We don't want the garbage collector to collect the data while the program is running, so we'll add another usage of xs
after the spin.
main = do
print (take 50 xs)
print (xs !! 99)
spin 1
print (xs !! 0)
We'll run this with the default heap profiling option, which groups memory usage by cost center.
example +RTS -h
This produces the file example.hp
. We'll pull a sample out of the middle of the file where the numbers are stable (while it was in a spin
).
BEGIN_SAMPLE 0.57
(42)PINNED 32720
(48)Data.Time.Clock.POSIX.CAF 48
(85)spin.endTime/spin/mai... 56
(84)spin.go/spin/main/Mai... 64
(81)xs/Main.CAF 4848
(82)f/xs/Main.CAF 816
(80)main/Main.CAF 160
(64)GHC.IO.Encoding.CAF 72
(68)GHC.IO.Encoding.CodeP... 136
(57)GHC.IO.Handle.FD.CAF 712
(47)Main.CAF 96
END_SAMPLE 0.57
We can see that f
has produced 816 bytes of memory. For "small" Integer
s, an Integer
consumes 2 word of memory. On my system, a word is 8 bytes of memory, so a "small" Integer
takes 16 bytes. Therefore 816/16 = 51 of the Integer
s produced by f
are probably still in memory.
We can check that all of this memory is actually being used for "small" Integer
s by asking for a profile by closure description with -hd
. We can't both group memory usage by closure descripiton and cost-centre, but we can restrict the profiling to a single cost-centre with -hc
, in this case, we are interested in the f
cost-centre
example +RTS -hd -hcf
This says that all 816 bytes due to f
are being used by S#
, the constructor for "small" Integer
s
BEGIN_SAMPLE 0.57
S# 816
END_SAMPLE 0.57
We can certainly remove the following, since 51 Integer
results are being kept, and it expects to only keep 50 Integer
s
- Keep results for indexes 0 - 49, thunk for indexes 50+
This leaves us with option
- Keep results for indexes 0 - 49, thunks for indexes 50 - 98, result for index 99, thunk for indexes 100+
Let's guess how much memory this situation would be consuming.
In general, Haskell data
types require 1 word of memory for the constructor, and 1 word for each field. The []
type's :
constructor has two fields, so it should take 3 words of memory, or 24 bytes. 100 :
s would then take 2400 bytes of memory. We will see that this is exactly correct when we ask about the closure description for xs
.
It's difficult to reason about the size of thunks, but we'll give it a try. There would be 49 thunks for the values of indices [50, 98]. Each of these thunks is probably holding the Integer
from the generator [0..]
. It also needs to hold the structure of a thunk, which unfortunately changes when profiling. There will also be a thunk for the remainder of a the list. It will need the Integer
from which the remainder of the list is generated, and the structure of a thunk.
Asking for a memory breakdown by closure description for the xs
cost-centre
example +RTS -hd -hcxs
Gives us the following sample
BEGIN_SAMPLE 0.60
<base:GHC.Enum.sat_s34b> 32
<base:GHC.Base.sat_s1be> 32
<main:Main.sat_s1w0> 16
S# 800
<base:GHC.Base.sat_s1bd> 1568
: 2400
END_SAMPLE 0.60
We were exactly correct about there being 100 :
s requiring 2400 bytes of memory. There are 49+1 = 50 "small" Integer
s S#
occupying 800 bytes for the thunks for the 49 uncalculated values and the thunk for the remainder of the lists. There are 1568 bytes which are probably the 49 thunks for the uncalculated values, which would each then be 32 bytes or 4 words. There are another 80 bytes we can't exactly explain that are left over for the thunk for the remainder of the list.
Both memory and time profiles are consistent with our belief that the program will
- Keep results for indexes 0 - 49, thunks for indexes 50 - 98, result for index 99, thunk for indexes 100+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With