Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I memoize?

I have written this function that computes Collatz sequences, and I see wildly varying times of execution depending on the spin I give it. Apparently it is related to something called "memoization", but I have a hard time understanding what it is and how it works, and, unfortunately, the relevant article on HaskellWiki, as well as the papers it links to, have all proven to not be easily surmountable. They discuss intricate details of the relative performance of highly layman-indifferentiable tree constructions, while what I miss must be some very basic, very trivial point that these sources neglect to mention.

This is the code. It is a complete program, ready to be built and executed.

module Main where

import Data.Function
import Data.List (maximumBy)

size :: (Integral a) => a
size = 10 ^ 6

-- Nail the basics.

collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
          | otherwise = n * 3 + 1

recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1) 
                          then f (collatz x)
                          else x

-- Now, I want to do the counting with a tuple monad.

mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)

remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
                            then f =<< mocollatz x
                            else return x

-- Trivialities.

collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1

collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])

longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
  where
    order :: Ord b => (a, b) -> (a, b) -> Ordering
    order x y = snd x `compare` snd y

main :: IO ()
main = print $ longestCollatz size

With ghc -O2 it takes about 17 seconds, without ghc -O2 -- about 22 seconds to deliver the length and the seed of the longest Collatz sequence starting at any point below size.

Now, if I make these changes:

diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
@@ -1,6 +1,7 @@
 module Main where

 import Data.Function
+import qualified Data.Map.Lazy as M
 import Data.List (maximumBy)

 size :: (Integral a) => a
@@ -22,10 +23,15 @@ recollatz = fix $ \f x -> if (x /= 1)
 mocollatz :: Integral b => b -> ([b], b)
 mocollatz n = ([n], collatz n)

-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
-                            then f =<< mocollatz x
-                            else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+    Nothing -> mutate x
+    Just y  -> y
+  where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]

 -- Trivialities.

-- Then it will take just about 4 seconds with ghc -O2, but I would not live long enough to see it complete without ghc -O2.

Looking at the details of cost centres with ghc -prof -fprof-auto -O2 reveals that the first version enters collatz about a hundred million times, while the patched one -- just about one and a half million times. This must be the reason of the speedup, but I have a hard time understanding the inner workings of this magic. My best idea is that we replace a portion of expensive recursive calls with O(log n) map lookups, but I don't know if it's true and why it depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?


P.S. There are two requirements to the achieving of automagical memoization highlighted elsewhere on Stack Overflow:

  • Make a function to be memoized a top-level name.

  • Make a function to be memoized a monomorphic one.

In line with these requirements, I rebuilt remocollatz as follows:

remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x

mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x

Now it's as top level and as monomorphic as it gets. Running time is about 11 seconds, versus the similarly monomorphized table version:

remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
    Nothing -> mutate x
    Just y  -> y

mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x

table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]

-- Running in less than 4 seconds.

I wonder why the memoization ghc is supposedly performing in the first case here is almost 3 times slower than my dumb table.

like image 966
Ignat Insarov Avatar asked Jan 29 '23 20:01

Ignat Insarov


1 Answers

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?

Disclaimer: this is a guess, not verified by viewing GHC core output. A careful answer would do so to verify the conjectures outlined below. You can try peering through it yourself: add -ddump-simpl to your compilation line and you will get copious output detailing exactly what GHC has done to your code.

You write:

remocollatz x = {- ... -} table mutate {- ... -}
  where mutate x = remocollatz =<< mocollatz x

The expression table mutate in fact does not depend on x; but it appears on the right-hand side of an equation that takes x as an argument. Consequently, without optimizations, this table is recomputed each time remocollatz is called (presumably even from inside the computation of table mutate).

With optimizations, GHC notices that table mutate does not depend on x, and floats it to its own definition, effectively producing:

fresh_variable_name = table mutate
  where mutate x = remocollatz =<< mocollatz x

remocollatz x = case M.lookup x fresh_variable_name of
    {- ... -}

The table is therefore computed just once for the entire program run.

don't know why it [the performance] depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.

Sorry, but Haskell doesn't work that way. The language definition tells clearly what the meaning of a given Haskell term is, but does not say anything about the runtime or memory performance needed to compute that meaning.

like image 96
Daniel Wagner Avatar answered Feb 08 '23 08:02

Daniel Wagner