Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

STUArray s i e - space leak only when i == Int?

I am confused by the behaviour of the following snipped:

import Data.Int
import Data.Array.ST
import Control.Monad.ST

{-# INLINE fib #-}
fib _ 0 = return 0
fib _ 1 = return 1
fib c n = do
  f1 <- memo c (fib c) (n-1)
  f2 <- memo c (fib c) (n-2)
  return (f1+f2)

newtype C a = C a

{-# INLINE memo #-}
memo (C a) f k = do
  e <- readArray a k
  if e == (-1)
     then do
       v <- f k
       writeArray a k v
       return v
     else return e

evalFib :: Int -> Int
evalFib n = runST $ do
  a <- newArray (0,n) (-1) :: ST s (STUArray s Int Int)
  fib (C a) n

main = print $ evalFib 120000

When compiled with -O2 it stack-overflows (showing 20M of memory in use). The confusing part is that it actually works as expected (no stack overflow and 9M memory in use) if any of the following changes is made:

  • Int64 is used instead of Int: (giving evalFib :: Int64 -> Int and STUArray s Int64 Int). In fact any Int* (Int32, Int16, etc) will do the trick as well as Word or Word*;
  • newtype C a is removed from the picture;
  • data C a = C !a is used instead of newtype

I am trying to get my head around this behaviour: is it a bug in GHC/array module (it shows identical behaviour in 7.4.2 and 7.6.2) or is it supposed to work that way?

PS The funny thing is that when I try to compile it with ghc array.hs -O2 -fext-core to see the differences in core produced both GHC versions fail with "ghc: panic! (the 'impossible' happened)". No luck here either..

like image 549
Ed'ka Avatar asked Feb 23 '13 07:02

Ed'ka


2 Answers

Your initial program, as you say, stack overflows:

$ ghc -O2 A.hs --make -rtsopts
$ ./A +RTS -s
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
      21,213,928 bytes allocated in the heap
       3,121,864 bytes copied during GC
       5,400,592 bytes maximum residency (4 sample(s))
          20,464 bytes maximum slop
              20 MB total memory in use (0 MB lost due to fragmentation)

While if we change to Int64 it works fine:

$ time ./A
-2092835058118682368
./A  0.03s user 0.01s system 92% cpu 0.050 total

So where's the leak?

Just visually inspecting your code, there's an obvious issue:

  • fib is lazy in its array argument

You can also infer this from the behavior you saw:

newtype C a is removed from the picture;

data C a = C !a is used instead of newtype

All of which server to expose the array to the strictness analyzer.

If you make fib strict in the array:

{-# LANGUAGE BangPatterns #-}

{-# INLINE fib #-}
fib !_ 0 = return 0
fib !_ 1 = return 1
fib !c n = do
  f1 <- memo c (fib c) (n-1)
  f2 <- memo c (fib c) (n-2)
  return (f1+f2)

Then it just works:

$ time ./A
-2092835058118682368
./A  0.03s user 0.01s system 89% cpu 0.052 total

So why does it leak with one type, but not another? You just got lucky with Int64 I think. But I would consider this a "bug", probably in the rewrite rules for the various num types.

Looking at the output of the simplifier , we get quite a different run of rewrite rules firing in the Int64 case vs the Int case. As underlying functions are often indexed by Int, you do end up exercising different optimizations by using the common Int type. In this case, sufficient to confuse the strictness analyzer.

As always, the general rules apply: give the compiler more hints and you get better code.

like image 69
Don Stewart Avatar answered Nov 08 '22 21:11

Don Stewart


Looking at the core from 7.6.1, with -O2 and -dsuppress-uniques, the function that does the work, Main.main_$spoly_$wa is structurally (almost) identical whether I use int or Int64 as the index type. Since the core is long and complicated, here's the diff output:

$ diff Int_core.dump-simpl Int64_core.dump-simpl 
11,12c11,12
<           (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
<           (Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int))
---
>           (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)
>           (Main.C (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int))
26,27c26,27
<             (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
<             (Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)))
---
>             (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)
>             (Main.C (Data.Array.Base.STUArray s GHC.Int.Int64 GHC.Types.Int)))

Different index types, these are of course different.

33,40d32
<           l :: GHC.Types.Int
<           [LclId]
<           l = GHC.Types.I# sc } in
<         let {
<           u :: GHC.Types.Int
<           [LclId]
<           u = GHC.Types.I# sc1 } in
<         let {

For index type Int, GHC produces somewhat more informative errors for out-of-bounds indices, for that it needs the lower and upper bound of the valid indices. (The default implementation of index is not overridden in the instance Ix Int64.)

45,46c37
<           GHC.Types.False ->
<             case poly_$w$j5 (GHC.Types.I# a) l u of wild2 { };
---
>           GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild1 { };

Different error, indexError vs. hopelessIndexError. The following differences also only concern index errors.

49,50c40
<               GHC.Types.False ->
<                 case poly_$w$j5 (GHC.Types.I# a) l u of wild2 { };
---
>               GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild2 { };
58c48
<                     case poly_$w$j4 y (GHC.Types.I# sc2) of wild3 { };
---
>                     case poly_$w$j3 y (GHC.Types.I# sc2) of wild4 { };
62c52
<                         case poly_$w$j4 y (GHC.Types.I# sc2) of wild5 { };
---
>                         case poly_$w$j3 y (GHC.Types.I# sc2) of wild5 { };
77,78c67
<                                 GHC.Types.False ->
<                                   case poly_$w$j3 (GHC.Types.I# a1) l u of wild6 { };
---
>                                 GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild6 { };
81,82c70
<                                     GHC.Types.False ->
<                                       case poly_$w$j3 (GHC.Types.I# a1) l u of wild7 { };
---
>                                     GHC.Types.False -> case GHC.Arr.hopelessIndexError of wild7 { };

Now once more the different index type:

110c98
<                                                                       GHC.Types.Int
---
>                                                                       GHC.Int.Int64
152c140
<                                                 s GHC.Types.Int GHC.Types.Int>)>)
---
>                                                 s GHC.Int.Int64 GHC.Types.Int>)>)

And finally, 0 and 1 have gotten different top-level names.

177,178c165,166
<       0 -> (# sc5, lvl5 #);
<       1 -> (# sc5, lvl6 #)
---
>       0 -> (# sc5, lvl #);
>       1 -> (# sc5, lvl1 #)

So the entire code that does the actual work is identical. Yet the one causes a stack overflow (though only just, -K9M suffices [-K8731K is enough here, -K8730K not]) and the other doesn't.

The difference is indeed caused by the index errors. The code with Int indices allocates two boxed Ints in every recursive call that the Int64 code doesn't allocate, because

Main.main_$spoly_$wa [Occ=LoopBreaker]
  :: forall s.
     GHC.Prim.Int#
     -> GHC.Prim.Int#
     -> GHC.Prim.Int#
     -> GHC.Prim.MutableByteArray# s
     -> (GHC.Prim.~#)
          *
          (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int)
          (Main.C (Data.Array.Base.STUArray s GHC.Types.Int GHC.Types.Int))
     -> GHC.Prim.Int#
     -> GHC.Prim.State# s
     -> (# GHC.Prim.State# s, GHC.Types.Int #)

carries around two references to the array.

That uses more stack, and these boxed Ints have to be garbage collected, which causes the much larger GC figures. Additionally, the thunk for the index error is a bit larger than the hopelessIndexError thunk.

Now, if you help the compiler by

  • removing the newtype wrapper
  • making the function strict in the array (by bang patterns or data C a = C !a)

or some other ways, it produces better code that manages without a stack overflow for the given argument, since there is only one reference to the array in the worker, and thus the allocation of the boxed Ints for the bounds isn't needed.

Note that this algorithm causes a stack overflow for slightly larger arguments even with the help for the compiler.

like image 5
Daniel Fischer Avatar answered Nov 08 '22 21:11

Daniel Fischer