Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell getsizeof big integer

Tags:

haskell

I'm trying to get the size in bytes of a big integer, given that there's arbitrary precision math under the hood, and I'd like to know how much space the result is actually using.

Here's a sample:

Prelude> import Data.Bits
Prelude> let fac1000 = product [1..1000] # big!
Prelude Data.Bits> finiteBitSize fac1000

<interactive>:37:1: error:
    • Ambiguous type variable ‘b0’ arising from a use of ‘finiteBitSize’
      prevents the constraint ‘(FiniteBits b0)’ from being solved.
      Probable fix: use a type annotation to specify what ‘b0’ should be.
      These potential instances exist:
        instance FiniteBits Bool -- Defined in ‘Data.Bits’
        instance FiniteBits Int -- Defined in ‘Data.Bits’
        instance FiniteBits Word -- Defined in ‘Data.Bits’
    • In the expression: finiteBitSize fac1000
      In an equation for ‘it’: it = finiteBitSize fac1000

<interactive>:37:15: error:
    • Ambiguous type variable ‘b0’ arising from a use of ‘fac1000’
      prevents the constraint ‘(Num b0)’ from being solved.
      Probable fix: use a type annotation to specify what ‘b0’ should be.
      These potential instances exist:
        instance Num Integer -- Defined in ‘GHC.Num’
        instance Num Double -- Defined in ‘GHC.Float’
        instance Num Float -- Defined in ‘GHC.Float’
        ...plus two others
        (use -fprint-potential-instances to see them all)
    • In the first argument of ‘finiteBitSize’, namely ‘fac1000’
      In the expression: finiteBitSize fac1000
      In an equation for ‘it’: it = finiteBitSize fac1000

The type annotations they are suggesting do not seem reasonable, when I coerce to an integer for example:

Prelude Data.Bits> finiteBitSize (fac1000 :: Int)
64

Well, that's a big number and I don't believe it. In python, I get:

>>> import sys, math
>>> sys.getsizeof(math.factorial(1000))
1164

which looks a lot more credible to me for the astronomically large 4.02e2568.

like image 521
Mittenchops Avatar asked May 27 '19 23:05

Mittenchops


2 Answers

You may approximate the number of bytes by asking for its log base 256 using the integer-logarithms package:

Math.NumberTheory.Logarithms> integerLogBase 256 (product [1..1000])
1066

This is only an approximation because it accounts only for the bytes being used to store the number; typically arbitrary-precision integers also have some overhead for storing information about the length of the number and probably a bit of over-allocation, and neither of these will be accounted for by taking the logarithm.

If you don't mind getting the approximate size reported in bits instead of bytes, integerLog2 will be faster.

Math.NumberTheory.Logarithms> integerLog2 (product [1..1000])
8529

If you want the true answer, you'll have to touch some really low-level APIs and depend on the exact definition of Integer:

{-# LANGUAGE MagicHash #-}
import Data.Bits
import GHC.Exts
import GHC.Integer.GMP.Internals
import GHC.Prim

sizeOfInteger :: Integer -> Int
sizeOfInteger n = constructorSize + case n of
    S# i -> finiteBitSize (I# i) `div` 8
    Jp# bn -> sizeOfBigNat bn
    Jn# bn -> sizeOfBigNat bn
    where
    constructorSize = finiteBitSize (0 :: Word) `div` 8
    sizeOfBigNat (BN# arr) = constructorSize + I# (sizeofByteArray# arr)

Try it out in ghci:

> sizeOfInteger (product [1..1000])
1088

Beware! I don't know all the promises of these internal APIs. It's possible that computing equal Integers in different ways may produce values that are represented differently. When touching these internal APIs you sometimes lose the guarantees of the abstract external API; in this case, you may not have that x == y implies sizeOfInteger x == sizeOfInteger y. Read the docs carefully if you plan to take this route!

like image 178
Daniel Wagner Avatar answered Nov 15 '22 06:11

Daniel Wagner


You've misunderstood what finiteBitSize does. From the docs, emphasis mine:

Return the number of bits in the type of the argument. The actual value of the argument is ignored.

The finiteBitSize :: FiniteBits b => b -> Int function tells you a property of the type b, and it uses the argument to pick out which type. Any Int would give you the same answer:

ghci> finiteBitSize (0 :: Int)
64
ghci> finiteBitSize (maxBound :: Int)
64
ghci> finiteBitSize (undefined :: Int)
64

This is because Int is the type of lazy machine integers, which fit in one word. Indeed:

ghci> product [1..1000] :: Int
0

A bit smaller than you may have been expecting :-)

If you want to measure the size of product [1..1000] as an unbounded Integer, you'll need a different technique. Daniel Wagner's answer provides two good approaches, both mathematical (how to compute log2 100!) and GHC-internal.

like image 22
Antal Spector-Zabusky Avatar answered Nov 15 '22 05:11

Antal Spector-Zabusky