I'm trying to get the size in bytes of a big integer, given that there's arbitrary precision math under the hood, and I'd like to know how much space the result is actually using.
Here's a sample:
Prelude> import Data.Bits
Prelude> let fac1000 = product [1..1000] # big!
Prelude Data.Bits> finiteBitSize fac1000
<interactive>:37:1: error:
• Ambiguous type variable ‘b0’ arising from a use of ‘finiteBitSize’
prevents the constraint ‘(FiniteBits b0)’ from being solved.
Probable fix: use a type annotation to specify what ‘b0’ should be.
These potential instances exist:
instance FiniteBits Bool -- Defined in ‘Data.Bits’
instance FiniteBits Int -- Defined in ‘Data.Bits’
instance FiniteBits Word -- Defined in ‘Data.Bits’
• In the expression: finiteBitSize fac1000
In an equation for ‘it’: it = finiteBitSize fac1000
<interactive>:37:15: error:
• Ambiguous type variable ‘b0’ arising from a use of ‘fac1000’
prevents the constraint ‘(Num b0)’ from being solved.
Probable fix: use a type annotation to specify what ‘b0’ should be.
These potential instances exist:
instance Num Integer -- Defined in ‘GHC.Num’
instance Num Double -- Defined in ‘GHC.Float’
instance Num Float -- Defined in ‘GHC.Float’
...plus two others
(use -fprint-potential-instances to see them all)
• In the first argument of ‘finiteBitSize’, namely ‘fac1000’
In the expression: finiteBitSize fac1000
In an equation for ‘it’: it = finiteBitSize fac1000
The type annotations they are suggesting do not seem reasonable, when I coerce to an integer for example:
Prelude Data.Bits> finiteBitSize (fac1000 :: Int)
64
Well, that's a big number and I don't believe it. In python, I get:
>>> import sys, math
>>> sys.getsizeof(math.factorial(1000))
1164
which looks a lot more credible to me for the astronomically large 4.02e2568.
You may approximate the number of bytes by asking for its log base 256 using the integer-logarithms package:
Math.NumberTheory.Logarithms> integerLogBase 256 (product [1..1000])
1066
This is only an approximation because it accounts only for the bytes being used to store the number; typically arbitrary-precision integers also have some overhead for storing information about the length of the number and probably a bit of over-allocation, and neither of these will be accounted for by taking the logarithm.
If you don't mind getting the approximate size reported in bits instead of bytes, integerLog2
will be faster.
Math.NumberTheory.Logarithms> integerLog2 (product [1..1000])
8529
If you want the true answer, you'll have to touch some really low-level APIs and depend on the exact definition of Integer
:
{-# LANGUAGE MagicHash #-}
import Data.Bits
import GHC.Exts
import GHC.Integer.GMP.Internals
import GHC.Prim
sizeOfInteger :: Integer -> Int
sizeOfInteger n = constructorSize + case n of
S# i -> finiteBitSize (I# i) `div` 8
Jp# bn -> sizeOfBigNat bn
Jn# bn -> sizeOfBigNat bn
where
constructorSize = finiteBitSize (0 :: Word) `div` 8
sizeOfBigNat (BN# arr) = constructorSize + I# (sizeofByteArray# arr)
Try it out in ghci:
> sizeOfInteger (product [1..1000])
1088
Beware! I don't know all the promises of these internal APIs. It's possible that computing equal Integer
s in different ways may produce values that are represented differently. When touching these internal APIs you sometimes lose the guarantees of the abstract external API; in this case, you may not have that x == y
implies sizeOfInteger x == sizeOfInteger y
. Read the docs carefully if you plan to take this route!
You've misunderstood what finiteBitSize
does. From the docs, emphasis mine:
Return the number of bits in the type of the argument. The actual value of the argument is ignored.
The finiteBitSize :: FiniteBits b => b -> Int
function tells you a property of the type b
, and it uses the argument to pick out which type. Any Int
would give you the same answer:
ghci> finiteBitSize (0 :: Int)
64
ghci> finiteBitSize (maxBound :: Int)
64
ghci> finiteBitSize (undefined :: Int)
64
This is because Int
is the type of lazy machine integers, which fit in one word. Indeed:
ghci> product [1..1000] :: Int
0
A bit smaller than you may have been expecting :-)
If you want to measure the size of product [1..1000]
as an unbounded Integer
, you'll need a different technique. Daniel Wagner's answer provides two good approaches, both mathematical (how to compute log2 100!) and GHC-internal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With