Let's say I have a very large number (millions/billions+) of these simple Foo
data structures:
data Foo = Foo
{ a :: {-# UNPACK #-}!Int
, b :: Int
}
With so many of these floating around, it becomes necessary to think about how much memory they consume.
On a 64bit machine, each Int
is 8 bytes, so a
only takes 8 bytes (because it is strict and unpacked). But how much memory will b
take up? I imagine this would change depending on whether the thunk is evaluated or not, right?
I imagine in the general case this is impossible to tell, because b
could depend on any number of memory positions that are only staying in memory in case b
needs to be evaluated. But what if b
only depended on (some very expensive operation on) a
? Then, is there a deterministic way to tell how much memory will be used?
In addition to user239558’s answer, and in response to your comment there, I’d like to point out some tools that allow you to inspect the heap representation of your value, find answers to questions like this yourself and to see the effect of optimizations and different ways of compiling.
tells you the size of a closure. Here you can see that (on a 64 bit machine) in evaluated form and after garbage collection, Foo 1 2
requires 24 bytes on its own and, including dependencies, 40 bytes in total:
Prelude GHC.DataSize Test> let x = Foo 1 2 Prelude GHC.DataSize Test> x Foo {a = 1, b = 2} Prelude GHC.DataSize Test> System.Mem.performGC Prelude GHC.DataSize Test> closureSize x 24 Prelude GHC.DataSize Test> recursiveSize x 40
To reproduce this you need to load the data definition in compiled form with -O
, otherwise, the {-# UNPACK #-}
pragma has no effect.
Now let us create a thunk and see that the size goes up considerably:
Prelude GHC.DataSize Test> let thunk = 2 + 3::Int Prelude GHC.DataSize Test> let x = Foo 1 thunk Prelude GHC.DataSize Test> x `seq` return () Prelude GHC.DataSize Test> System.Mem.performGC Prelude GHC.DataSize Test> closureSize x 24 Prelude GHC.DataSize Test> recursiveSize x 400
Now this is quite excessive. The reason is that this calculation includes references to static closures, Num
typeclass dictionaries and the like, and generally the GHCi bytecode is very unoptimized. So let’s put that in a proper Haskell program. Running
main = do
l <- getArgs
let n = length l
n `seq` return ()
let thunk = trace "I am evaluated" $ n + n
let x = Foo 1 thunk
a x `seq` return ()
performGC
s1 <- closureSize x
s2 <- closureSize thunk
r <- recursiveSize x
print (s1, s2, r)
gives (24, 24, 48)
, so now the Foo
value is made up of Foo
itself and a thunk. Why only the thunk, shouldn’t there also be a reference to n
adding another 16 bytes? To answer this, we need a better tool:
This library (by me) can investigate the heap and tell you precisely how your data is represented there. So adding this line to the file above:
buildHeapTree 1000 (asBox x) >>= putStrLn . ppHeapTree
we get (when we pass five parameters to the program) the result Foo (_thunk 5) 1
. Note that the order of arguments is swapped on the heap, because pointers always come before data. The plain 5
indicates that the closure of the thunk stores its argument unboxed.
As a last exercise we verify this by making the thunk lazy in n
: Now
main = do
l <- getArgs
let n = length l
n `seq` return ()
let thunk = trace "I am evaluated" $ n
let x = Foo 1 thunk
a x `seq` return ()
performGC
s1 <- closureSize x
s2 <- closureSize thunk
s3 <- closureSize n
r <- recursiveSize x
buildHeapTree 1000 (asBox x) >>= putStrLn . ppHeapTree
print (s1, s2, s3, r)
gives a heap representation of Foo (_thunk (I# 4)) 1
with a separate closure for n
(as indicated by the presence of the I#
constructor) and shows the expected sizes for the values and their total, (24,24,16,64)
.
Oh, and if this is still too high level, getClosureRaw gives you the raw bytes.
If b
is evaluated, it will be a pointer to an Int
object. The pointer is 8 bytes, and the Int
object consists of a header which is 8 bytes, and the Int#
which is 8 bytes.
So in that case the memory usage is the Foo
object (8 header, 8 Int
, 8 pointer) + boxed Int
(8 header, 8 Int#
).
When b
is unevaluated, the 8-byte pointer in Foo
will point to a Thunk object. The Thunk object represents an unevaluated expression. Like the Int
object, this object has a 8-byte header, but the rest of the object consists of the free variables in the unevaluated expression.
So first of all, the number of free variables held in this thunk object depends on the expression that creates the Foo object. Different ways of creating a Foo will have thunk objects of potentially different sizes created.
Then secondly, the free variables are all the variables that are mentioned in the unevaluated expression that are taken from outside of the expression, called the environment of a closure. They are sort of the parameters to the expression and they need to be stored somewhere, and thus they are stored in the thunk object.
So you can look at the actual places where the Foo constructor is invoked and look at the number of free variables in the second parameter to estimate the size of the thunk.
A Thunk object is really the same as a closure in most other programming languages, with one important difference. When it is evaluated, it can be overwritten by a redirect pointer to the evaluated object. Thus it is a closure that automatically memoizes its result.
This redirect pointer will point to the Int
object (16 bytes). However the now "dead" thunk will be eliminated on the next garbage collection. When the GC copies Foo, it will make Foo's b point directly to the Int object, making the thunk unreferenced and thus garbage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With