Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this Haskell program leak space when compiled with optimizations?

Consider the following toy program that computes all combinations of character substitutions in a word, of the kind often used in passwords.

import Data.Char (isLower, toUpper)

variants :: String -> [String]
variants "" = [""]
variants (c:s) = [c':s' | c' <- subst c, s' <- variants s]
  where subst 'a' = "aA@"
        subst 'e' = "eE3"
        subst 'i' = "iI1"
        subst 'l' = "lL1"
        subst 'o' = "oO0"
        subst 's' = "sS$5"
        subst 'z' = "zZ2"
        subst x | isLower x = [x, toUpper x]
        subst x = [x]

main :: IO ()
main = putStrLn $ show $ length $ variants "redistributables"

I compile it with and without optimizations:

$ ghc -fforce-recomp -Wall Test.hs -o test0
[1 of 1] Compiling Main             ( Test.hs, Test.o )
Linking test0 ...

$ ghc -fforce-recomp -O -Wall Test.hs -o test1
[1 of 1] Compiling Main             ( Test.hs, Test.o )
Linking test1 ...

Now test0 and test1 produce the same output, but test1 uses much more memory and spends most of its time in garbage collection:

$ ./test0 +RTS -s 2>&1 | grep total
               2 MB total memory in use (0 MB lost due to fragmentation)
  Productivity  93.2% of total user, 93.3% of total elapsed

$ ./test1 +RTS -s 2>&1 | grep total
             188 MB total memory in use (0 MB lost due to fragmentation)
  Productivity  15.0% of total user, 15.0% of total elapsed

Why?

I’m using GHC 7.4.1; I should probably use a newer compiler, but this is what I have handy at the moment, and the fault probably lies with me anyway.

like image 965
Vasiliy Faronov Avatar asked Jun 23 '15 19:06

Vasiliy Faronov


2 Answers

You want

variants (c:s) = [c':s' | c' <- subst c, s' <- variants s]

to be compiled into an outer loop and an inner loop. But GHC sees that the inner loop does not depend in any way on the outer "loop counter". Therefore, the full laziness transformation lifts the inner loop out of the outer loop. One fairly effective trick is to hide the fact that the inner loop is independent. This is done by splitting the inner loop off into a separate function taking a dummy argument, and hiding the dumminess by marking the function as NOINLINE. Then you can call the function with the outer loop counter, and GHC will generally refrain from messing with you.

like image 105
dfeuer Avatar answered Sep 28 '22 07:09

dfeuer


The trick is to cause the recomputation of suffixes, instead of their retention in memory. It's like with the

powerset (x:xs) = map (x:) (powerset xs) ++ powerset xs 

definition, where adding the where clause is harmful (or is it powerset (x:xs) = powerset xs ++ map (x:) (powerset xs) ...?).

In your case, the code to try is mapM subst, or

variants (c:cs) = variants cs >>= \s-> map (:s) (subst c) 

You can see that the latter works in the "opposite direction" from your list comprehension code, so maybe just

variants (c:s) = [c':s' | s' <- variants s, c' <- subst c]

will work, too.

All these are equivalent, so it's a compiler thing. Hopefully someone can provide more specifics about that.

like image 26
Will Ness Avatar answered Sep 28 '22 08:09

Will Ness