Wikipedia writes about Hylomorphism:
In [...] functional programming a hylomorphism is a recursive function, corresponding to the composition of an anamorphism (which first builds a set of results; also known as 'unfolding') followed by a catamorphism (which then folds these results into a final return value). Fusion of these two recursive computations into a single recursive pattern then avoids building the intermediate data structure. This is an example of deforestation, a program optimization strategy.
(Bold markup by me)
Using the recursion-schemes library I wrote a very simple hylomorphism:
import Data.Functor.Foldable
main :: IO ()
main = putStrLn $ show $ hylosum 1000
hylosum :: Int -> Int
hylosum end = hylo alg coalg 1
where
-- Create list of Int's from 1 to n
coalg :: Int -> ListF Int Int
coalg n
| n > end = Nil
| otherwise = Cons n (n + 1)
-- Sum up a list of Int's
alg :: ListF Int Int -> Int
alg Nil = 0
alg (Cons a x) = a + x
In the cabal file I instructed GHC to optimize the code:
name: Hylo
version: 0.1.0.0
synopsis: Hylomorphisms and Deforestation
build-type: Simple
cabal-version: >=1.10
executable Hylo
main-is: Main.hs
ghc-options: -O2
build-depends: base >=4.10 && <4.11 , recursion-schemes
default-language: Haskell2010
Using stackage lts-10.0 (GHC 8.2.2) I compile with stack build
and run with stack exec Hylo -- +RTS -s
and I get:
500500
84,016 bytes allocated in the heap
3,408 bytes copied during GC
44,504 bytes maximum residency (1 sample(s))
25,128 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Now I change hylosum 1000
to hylosum 1000000
(1000 times more) and I get:
500000500000
16,664,864 bytes allocated in the heap
16,928 bytes copied during GC
15,756,232 bytes maximum residency (4 sample(s))
29,224 bytes maximum slop
18 MB total memory in use (0 MB lost due to fragmentation)
So heap usage goes up from 84 KB to 16,664 KB. This is 200 times more than before. Therefore I think, GHC does not do the deforestation / fusion mentioned in wikipedia!
This is not really a surprise: The anamorphism creates the list items from left to right (from 1 to n) and the catamorphism consumes the items the opposite way from right to left (from n to 1) and it's difficult to see how the hylomorphism can work without creating the whole intermediate list.
Questions: Is GHC able to perform the deforestation? If yes, what do I have to change in my code or cabal file? If yes, how does it really work? if no, where is the issue: in wikipedia, GHC or in the library?
The data structure is actually fused away, but the resulting program is not tail recursive. The optimized code basically looks like this (with no Cons
or Nil
in sight):
h n | n > end = 0
| otherwise = n + h (n+1)
To evaluate the result you must first evaluate h (n+1)
recursively, and then add the result to n
. During the recursive call, the value n
must remain stored somewhere, hence we observe increased memory usage as end
increases.
A tighter loop can be obtained by having the recursive call be in tail position, and carry a constant size accumulator. We want the code to optimize to this instead:
-- with BangPatterns
h n !acc | n > end = acc
| otherwise = h (n+1) (n + acc)
In hylosum
, the call to (+)
happens in alg
, and we replace that with a call to a continuation that will be built up by hylo
.
alg :: ListF Int (Int -> Int) -> Int -> Int
alg Nil acc = acc
alg (Cons n go) !acc = go (n + acc)
With that I see a constant 51kB allocated in the heap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With