Recently, blog entries such as Computing the Size of a Hashmap explained how to reason about space complexities of commonly used container types. Now I'm facing the question of how to actually "see" which memory layout my GHC version chooses (depending on compile flags and target architecture) for weird data types (constructors) such as
data BitVec257 = BitVec257 {-# UNPACK #-} !Word64
{-# UNPACK #-} !Word64
{-# UNPACK #-} !Bool
{-# UNPACK #-} !Word64
{-# UNPACK #-} !Word64
data BitVec514 = BitVec514 {-# UNPACK #-} !BitVec257
{-# UNPACK #-} !BitVec257
In C there's the sizeof
and offsetof
operator, which allows me to "see" what size and alignment was chosen for the fields of C struct
.
I've tried to look at GHC Core in the hope to find some hint there, but I didn't know what to look for. Can somebody point me in the right direction?
My first idea was to use this neat litte function, due to Simon Marlow:
{-# LANGUAGE MagicHash,UnboxedTuples #-}
module Size where
import GHC.Exts
import Foreign
unsafeSizeof :: a -> Int
unsafeSizeof a =
case unpackClosure# a of
(# x, ptrs, nptrs #) ->
sizeOf (undefined::Int) + -- one word for the header
I# (sizeofByteArray# (unsafeCoerce# ptrs)
+# sizeofByteArray# nptrs)
Using it:
Prelude> :!ghc -c Size.hs
Size.hs:15:18:
Warning: Ignoring unusable UNPACK pragma on the
third argument of `BitVec257'
In the definition of data constructor `BitVec257'
In the data type declaration for `BitVec257'
Prelude Size> unsafeSizeof $! BitVec514 (BitVec257 1 2 True 3 4) (BitVec257 1 2 True 3 4)
74
(Note that GHC is telling you that it cannot unbox Bool
since it's a sum type.)
The above function claims that your data type uses 74 bytes on a 64-bit machine. I find that hard to believe. I'd expect the data type to use 11 words = 88 bytes, one word per field. Even Bool
s take one word, as they are pointer to (statically allocated) constructors. I'm not quite sure what's going on here.
As for alignment I believe every field should be word aligned.
Memory footprints of Haskell Data Types
(The following applies to GHC, other compilers may use different storage conventions)
Rule of thumb: a constructor costs one word for a header, and one word for each field. Exception: a constructor with no fields (like Nothing or True) takes no space, because GHC creates a single instance of these constructors and shares it amongst all uses.
A word is 4 bytes on a 32-bit machine, and 8 bytes on a 64-bit machine.
So e.g.
data Uno = Uno a
data Due = Due a b
an Uno takes 2 words, and a Due takes 3.
Also I believe it is possible to write a haskell function which performs the same tasks as sizeof
or offsetof
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With