Suppose I have the string
"abc"
and I want to calculate the number of bytes it occupies in memory.
I could do:
import Data.Bits
finiteBitSize ("abc" :: [Char])
but that breaks because [Char]
is not a type supported by the function. (Also it is bits not bytes, but the point was to paint a picture of what I'm looking for).
My question is: Can we write a function in Haskell to measure the number of bytes that a string of n characters occupies?
It's complicated.
Let's talk about GHC, and about String
specifically, and let's assume the thing has been fully evaluated, so we didn't get to use it iteratively in a GC-friendly way and we didn't get to delay evaluation and store a tiny thunk representing an enormous data structure.
After making all those simplifying assumptions, we're gonna need to know some definitions.
type String = [Char]
data [a] = [] | a : [a] -- pseudosyntax
data Char = C# Char# -- I'm guessing, couldn't find a canonical source
Now we're going to use a few rules of thumb. First: unboxed stuff (like Char#
) is generally stored in a machine word. We live in a world of 64-bit machines, so Char#
is probably 8 bytes, even though it likely only uses the bottom 4 bytes of it. Second: data constructors are a word to say which constructor, plus a word to point at each of the fields.
Now we're ready to tot it up.
Empty strings are []
, one word for the constructor, no words for fields, so one word total.
Nonempty strings are c : cs
, so one word for the :
constructor, one word to point to c
, one word to point to cs
, one word for the C#
constructor, one word for the Char#
. That's 5 words plus however many we need for cs
.
So for a String
of length n, we've got 5*n words to represent the body of the String
and one extra for the terminating []
. Practically speaking that's 40 bytes per character! Yikes.
And now you know why packed representations like Text
(or, when appropriate, ByteString
) are such a big deal.
There's a Haskell package called ghc-datasize
which allows you to calculate real occupied memory by every Haskell value. You can even see an example of how to calculate string size in the official Hackage documentation:
To calculate the length of the String
you need to use the following function:
recursiveSize :: a -> IO Word
It looks like this:
ghci> recursiveSize $!! "foobar"
240
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With