Mind the following Haskell program:
-- (^) allocs memory so we define it using the native (**)
pow :: Int -> Int -> Int
pow x y = floor $ fromIntegral x ** fromIntegral y
-- tail recursive, div and mod are native, I believe, so, no alloc
isPalindrome :: Int -> Bool
isPalindrome x = go (pow 10 (floor $ logBase 10 $ fromIntegral x)) x
where go m x = x <= 0 || div x m == mod x 10 && go (div m 100) (div (x - m * mod x 10) 10)
-- go is tail recursive too... no obvious allocation here
wanderer :: Int -> Int
wanderer n = go 0 (pow 10 n - 1) (pow 10 n - 1) where
go p x y | p > 0 && div p x >= pow 10 n = p
go p x y | p > 0 && y < div p x || y < x = go p (x-1) (pow 10 n - 1)
go p x y | isPalindrome (x*y) = go (x*y) x (y-1)
go p x y = go p x (y-1)
main = print . wanderer $ 8
Profiling it, we get:
Fri May 8 03:36 2015 Time and Allocation Profiling Report (Final)
aff +RTS -p -RTS
total time = 7.34 secs (7344 ticks @ 1000 us, 1 processor)
total alloc = 6,487,919,472 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
isPalindrome Main 41.9 18.5
isPalindrome.go Main 22.6 1.4
wanderer.go Main 20.0 67.8
pow Main 15.5 12.3
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 47 0 0.0 0.0 100.0 100.0
main Main 95 0 0.0 0.0 0.0 0.0
CAF:main1 Main 92 0 0.0 0.0 0.0 0.0
main Main 94 1 0.0 0.0 0.0 0.0
CAF:main2 Main 91 0 0.0 0.0 100.0 100.0
main Main 96 0 0.0 0.0 100.0 100.0
wanderer Main 98 1 0.0 0.0 100.0 100.0
pow Main 101 1 0.0 0.0 0.0 0.0
wanderer.go Main 99 49995002 20.0 67.8 100.0 100.0
isPalindrome Main 102 49985002 41.9 18.5 80.0 32.2
pow Main 104 49985002 15.5 12.3 15.5 12.3
isPalindrome.go Main 103 52207117 22.6 1.4 22.6 1.4
pow Main 100 1 0.0 0.0 0.0 0.0
pow Main 97 2 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 85 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 78 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 76 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 69 0 0.0 0.0 0.0 0.0
CAF GHC.Event.Thread 55 0 0.0 0.0 0.0 0.0
As far as I'm aware, it seems like all my functions are tail recursive and those prelude functions are asm operations. Yet this simple program allocs 7gb of memory. Where is all the allocation coming from?
Haskell computations produce a lot of memory garbage - much more than conventional imperative languages. It's because data are immutable so the only way to store every next operation's result is to create new values. In particular, every iteration of a recursive computation creates a new value.
The Haskell runtime system employs a generational garbage collector (GC) with two generations 2. Generations are numbered starting with the youngest generation at zero.
The answer is that a GC is required under the hood to reclaim the heap objects that the language MUST create. For example. A pure function needs to create heap objects because in some cases it has to return them. That means that they can't be allocated on the stack.
By default, Haskell heap objects are stored “boxed” — they are represented as a pointer to an object on the heap. Unboxed objects on the other hand are represented as the actual raw data. In some languages these would also be referred to as reference and value types.
The allocation is coming from the go
in isPalindrome
:
go m x = x <= 0 || div x m == mod x 10 && go (div m 100) (div (x - m * mod x 10) 10)
We have a ||
on the right hand side. The short-circuit semantics of ||
is realized through lazy evaluation. GHC sees that the m
argument is unused if x <= 0
evaluates to True
, so it doesn't unbox m
, allowing it to remain uncomputed. Of course, in this case we're better off unboxing m
too, so let's do that.
{-# LANGUAGE BangPatterns #-}
go !m x = ...
Now with ghc -O2
and +RTS -s
:
52,016 bytes allocated in the heap
3,408 bytes copied during GC
44,312 bytes maximum residency (1 sample(s))
17,128 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With