If I define the Kolakoski Sequence as <pre class="prettyprint"><code>kolakoski :: () -> [Int] kolakoski () = 1 : 2 : helper () where helper () = 2 : concat (zipWith replicate (helper ()) (cycle [1, 2])) </code></pre> and find the 500,000,000th term with <pre class="prettyprint"><code>kolakoski () !! 500000000 </code></pre> I find that when compiled with ghc -O this quickly consumes vast amounts of memory. But with optimizations turned off it uses virtually nothing. Which optimization is causing this and how do I turn it off?

Let's compare actual numbers. Your version of <code>kolakoski</code> uses about 70k if run without optimization: <pre class="prettyprint"> $ ghc --make Kolakoski-Unit && ./Kolakoski-Unit +RTS -s 2 288,002,359,096 bytes allocated in the heap 1,343,933,816 bytes copied during GC 67,576 bytes maximum residency (422 sample(s)) 52,128 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 551615 colls, 0 par 1.89s 2.30s 0.0000s 0.0001s Gen 1 422 colls, 0 par 0.02s 0.02s 0.0001s 0.0001s INIT time 0.00s ( 0.00s elapsed) MUT time 37.34s ( 37.25s elapsed) GC time 1.91s ( 2.33s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 39.25s ( 39.58s elapsed) %GC time 4.9% (5.9% elapsed) Alloc rate 7,712,197,063 bytes per MUT second Productivity 95.1% of total user, 94.3% of total elapsed </pre> With optimization, it uses about ~4GB (althhough the actual memory usage in the task manager goes up to ~6GB). <pre class="prettyprint"> $ ghc --make Kolakoski-Unit -O && ./Kolakoski-Unit +RTS -s 2 64,000,066,608 bytes allocated in the heap 27,971,527,816 bytes copied during GC 3,899,031,480 bytes maximum residency (34 sample(s)) 91,679,728 bytes maximum slop 9549 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 122806 colls, 0 par 8.67s 9.48s 0.0001s 0.0148s Gen 1 34 colls, 0 par 11.55s 69.78s 2.0524s 56.2970s INIT time 0.00s ( 0.00s elapsed) MUT time 8.80s ( 8.43s elapsed) GC time 20.22s ( 79.26s elapsed) EXIT time 0.03s ( 0.89s elapsed) Total time 29.05s ( 88.58s elapsed) %GC time 69.6% (89.5% elapsed) Alloc rate 7,275,318,406 bytes per MUT second Productivity 30.4% of total user, 10.0% of total elapsed </pre> If we use a list based version and no optimization, the memory consumption is very similar to the one with optimizations enabled: <pre class="prettyprint"><code>kolakoskiList :: [Int] kolakoskiList = 1 : 2 : helper where helper = 2 : concat (zipWith replicate helper (cycle [1, 2])) </code></pre> <pre class="prettyprint"> $ ghc --make Kolakoski-List && ./Kolakoski-List +RTS -s 2 96,000,143,328 bytes allocated in the heap 26,615,974,536 bytes copied during GC 3,565,429,808 bytes maximum residency (34 sample(s)) 83,610,688 bytes maximum slop 8732 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 184252 colls, 0 par 9.98s 10.16s 0.0001s 0.0006s Gen 1 34 colls, 0 par 10.45s 21.61s 0.6357s 12.0792s INIT time 0.00s ( 0.00s elapsed) MUT time 12.02s ( 11.88s elapsed) GC time 20.44s ( 31.77s elapsed) EXIT time 0.05s ( 0.69s elapsed) Total time 32.50s ( 44.34s elapsed) %GC time 62.9% (71.7% elapsed) Alloc rate 7,989,608,807 bytes per MUT second Productivity 37.1% of total user, 27.2% of total elapsed </pre> Now, we can check the GHC flag reference for the flags that get automatically set on <code>-O</code>. Since the list version seems to do the same thing as the optimized one, one might think that GHC transforms <code>kolakoski</code> to <code>kolakoskiList</code>: <pre class="prettyprint"><code>kolakoskiOptimized _ = kolakoskiList </code></pre> Let's check this in core with <code>-ddump-simpl -dsupress-all</code>: <pre class="prettyprint"><code>==================== Tidy Core ==================== Result size of Tidy Core = {terms: 45, types: 30, coercions: 0} kolakoski kolakoski = \ ds_d10r -> case ds_d10r of _ { () -> : (I# 1) (: (I# 2) (letrec { helper_aNo helper_aNo = \ ds1_d10s -> case ds1_d10s of _ { () -> : (I# 2) (concat (zipWith (replicate) (helper_aNo ()) (cycle (: (I# 1) (: (I# 2) ([])))))) }; } in helper_aNo ())) } main main = print $fShowInt (!! (kolakoski ()) (I# 500000000)) main main = runMainIO main </code></pre> Even if you're not familiar with GHC's core, you can see that <code>kolakoski</code> is mostly the same as your original version. Now compare this with <code>-O -ddump-simpl -dsupress-all</code>: <pre class="prettyprint"><code>==================== Tidy Core ==================== Result size of Tidy Core = {terms: 125, types: 102, coercions: 9} kolakoski6 kolakoski6 = I# 1 kolakoski5 kolakoski5 = I# 2 Rec { go_r1NG go_r1NG = \ ds_a14B _ys_a14C -> case ds_a14B of _ { [] -> []; : ipv_a14H ipv1_a14I -> case _ys_a14C of _ { [] -> []; : ipv2_a14O ipv3_a14P -> case ipv_a14H of _ { I# n#_a13J -> case tagToEnum# (<=# n#_a13J 0) of _ { False -> let { lvl2_s1N3 lvl2_s1N3 = : ipv2_a14O ([]) } in letrec { xs_a1LH xs_a1LH = \ m_a1LO -> case tagToEnum# (<=# m_a1LO 1) of _ { False -> : ipv2_a14O (xs_a1LH (-# m_a1LO 1)); True -> lvl2_s1N3 }; } in ++ (xs_a1LH n#_a13J) (go_r1NG ipv1_a14I ipv3_a14P); True -> ++ ([]) (go_r1NG ipv1_a14I ipv3_a14P) } } } } end Rec } lvl_r1NH lvl_r1NH = : kolakoski5 ([]) lvl1_r1NI lvl1_r1NI = : kolakoski6 lvl_r1NH Rec { xs'_r1NJ xs'_r1NJ = ++ lvl1_r1NI xs'_r1NJ end Rec } Rec { kolakoski3 kolakoski3 = : kolakoski5 kolakoski4 kolakoski4 kolakoski4 = go_r1NG kolakoski3 xs'_r1NJ end Rec } kolakoski2 kolakoski2 = : kolakoski5 kolakoski3 kolakoski1 kolakoski1 = : kolakoski6 kolakoski2 kolakoski kolakoski = \ ds_d13p -> case ds_d13p of _ { () -> kolakoski1 } </code></pre> This version contains several top-level CAFs, which are being retained for the lifetime of the program. So you really generate the list up to the 500,000,000th value and save it. Now, what happened there? Something that was inside of your function floated outwards. Let's check the flag reference again. There's a promising flag, which is implied by <code>-O</code>: <blockquote> <code>-ffull-laziness</code> Turn on full laziness (floating bindings outwards). Implied by <code>-O</code>. </blockquote> And that's the flag that lead to your problems. Indeed, you can use <code>ghc --make -O -fno-full-laziness Kolakoski-Unit.hs</code> to get your original memory consumption: <pre class="prettyprint"> $ ghc --make Kolakoski-Unit.hs -O -fno-full-laziness && ./Kolakoski-Unit +RTS -s 2 192,001,417,688 bytes allocated in the heap 637,962,464 bytes copied during GC 66,104 bytes maximum residency (151 sample(s)) 43,448 bytes maximum slop 2 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 368364 colls, 0 par 1.34s 1.32s 0.0000s 0.0002s Gen 1 151 colls, 0 par 0.00s 0.01s 0.0001s 0.0003s INIT time 0.00s ( 0.00s elapsed) MUT time 27.89s ( 28.13s elapsed) GC time 1.34s ( 1.33s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 29.25s ( 29.46s elapsed) %GC time 4.6% (4.5% elapsed) Alloc rate 6,884,084,443 bytes per MUT second Productivity 95.4% of total user, 94.7% of total elapsed </pre> <h3>Related questions</h3> <ul> <li>How to make a CAF not a CAF in Haskell?</li> </ul>

Why aren't the earlier terms here being garbage-collected?

Tags:

memory

compiler-optimization

garbage-collection

haskell

ghc

If I define the Kolakoski Sequence as

kolakoski :: () -> [Int]
kolakoski () = 1 : 2 : helper ()
  where
    helper () = 2 : concat (zipWith replicate (helper ()) (cycle [1, 2]))

and find the 500,000,000th term with

kolakoski () !! 500000000

I find that when compiled with ghc -O this quickly consumes vast amounts of memory. But with optimizations turned off it uses virtually nothing. Which optimization is causing this and how do I turn it off?

467

asked May 05 '15 03:05

dspyz

1 Answers

Let's compare actual numbers. Your version of kolakoski uses about 70k if run without optimization:

$ ghc --make Kolakoski-Unit && ./Kolakoski-Unit +RTS -s
2
 288,002,359,096 bytes allocated in the heap
   1,343,933,816 bytes copied during GC
          67,576 bytes maximum residency (422 sample(s))
          52,128 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     551615 colls,     0 par    1.89s    2.30s     0.0000s    0.0001s
  Gen  1       422 colls,     0 par    0.02s    0.02s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   37.34s  ( 37.25s elapsed)
  GC      time    1.91s  (  2.33s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   39.25s  ( 39.58s elapsed)

  %GC     time       4.9%  (5.9% elapsed)

  Alloc rate    7,712,197,063 bytes per MUT second

  Productivity  95.1% of total user, 94.3% of total elapsed

With optimization, it uses about ~4GB (althhough the actual memory usage in the task manager goes up to ~6GB).

$ ghc --make Kolakoski-Unit -O && ./Kolakoski-Unit +RTS -s
2
  64,000,066,608 bytes allocated in the heap
  27,971,527,816 bytes copied during GC
   3,899,031,480 bytes maximum residency (34 sample(s))
      91,679,728 bytes maximum slop
            9549 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     122806 colls,     0 par    8.67s    9.48s     0.0001s    0.0148s
  Gen  1        34 colls,     0 par   11.55s   69.78s     2.0524s    56.2970s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    8.80s  (  8.43s elapsed)
  GC      time   20.22s  ( 79.26s elapsed)
  EXIT    time    0.03s  (  0.89s elapsed)
  Total   time   29.05s  ( 88.58s elapsed)

  %GC     time      69.6%  (89.5% elapsed)

  Alloc rate    7,275,318,406 bytes per MUT second

  Productivity  30.4% of total user, 10.0% of total elapsed

If we use a list based version and no optimization, the memory consumption is very similar to the one with optimizations enabled:

kolakoskiList :: [Int]
kolakoskiList = 1 : 2 : helper 
  where
    helper = 2 : concat (zipWith replicate helper (cycle [1, 2]))

$ ghc --make Kolakoski-List && ./Kolakoski-List +RTS -s
2
  96,000,143,328 bytes allocated in the heap
  26,615,974,536 bytes copied during GC
   3,565,429,808 bytes maximum residency (34 sample(s))
      83,610,688 bytes maximum slop
            8732 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     184252 colls,     0 par    9.98s   10.16s     0.0001s    0.0006s
  Gen  1        34 colls,     0 par   10.45s   21.61s     0.6357s    12.0792s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   12.02s  ( 11.88s elapsed)
  GC      time   20.44s  ( 31.77s elapsed)
  EXIT    time    0.05s  (  0.69s elapsed)
  Total   time   32.50s  ( 44.34s elapsed)

  %GC     time      62.9%  (71.7% elapsed)

  Alloc rate    7,989,608,807 bytes per MUT second

  Productivity  37.1% of total user, 27.2% of total elapsed

Now, we can check the GHC flag reference for the flags that get automatically set on -O. Since the list version seems to do the same thing as the optimized one, one might think that GHC transforms kolakoski to kolakoskiList:

kolakoskiOptimized _ = kolakoskiList

Let's check this in core with -ddump-simpl -dsupress-all:

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 45, types: 30, coercions: 0}

kolakoski
kolakoski =
  \ ds_d10r ->
    case ds_d10r of _ { () ->
    : (I# 1)
      (: (I# 2)
         (letrec {
            helper_aNo
            helper_aNo =
              \ ds1_d10s ->
                case ds1_d10s of _ { () ->
                : (I# 2)
                  (concat
                     (zipWith
                        (replicate) (helper_aNo ()) (cycle (: (I# 1) (: (I# 2) ([]))))))
                }; } in
          helper_aNo ()))
    }

main
main = print $fShowInt (!! (kolakoski ()) (I# 500000000))

main
main = runMainIO main

Even if you're not familiar with GHC's core, you can see that kolakoski is mostly the same as your original version. Now compare this with -O -ddump-simpl -dsupress-all:

==================== Tidy Core ====================
Result size of Tidy Core = {terms: 125, types: 102, coercions: 9}

kolakoski6
kolakoski6 = I# 1

kolakoski5
kolakoski5 = I# 2

Rec {
go_r1NG
go_r1NG =
  \ ds_a14B _ys_a14C ->
    case ds_a14B of _ {
      [] -> [];
      : ipv_a14H ipv1_a14I ->
        case _ys_a14C of _ {
          [] -> [];
          : ipv2_a14O ipv3_a14P ->
            case ipv_a14H of _ { I# n#_a13J ->
            case tagToEnum# (<=# n#_a13J 0) of _ {
              False ->
                let {
                  lvl2_s1N3
                  lvl2_s1N3 = : ipv2_a14O ([]) } in
                letrec {
                  xs_a1LH
                  xs_a1LH =
                    \ m_a1LO ->
                      case tagToEnum# (<=# m_a1LO 1) of _ {
                        False -> : ipv2_a14O (xs_a1LH (-# m_a1LO 1));
                        True -> lvl2_s1N3
                      }; } in
                ++ (xs_a1LH n#_a13J) (go_r1NG ipv1_a14I ipv3_a14P);
              True -> ++ ([]) (go_r1NG ipv1_a14I ipv3_a14P)
            }
            }
        }
    }
end Rec }

lvl_r1NH
lvl_r1NH = : kolakoski5 ([])

lvl1_r1NI
lvl1_r1NI = : kolakoski6 lvl_r1NH

Rec {
xs'_r1NJ
xs'_r1NJ = ++ lvl1_r1NI xs'_r1NJ
end Rec }

Rec {
kolakoski3
kolakoski3 = : kolakoski5 kolakoski4

kolakoski4
kolakoski4 = go_r1NG kolakoski3 xs'_r1NJ
end Rec }

kolakoski2
kolakoski2 = : kolakoski5 kolakoski3

kolakoski1
kolakoski1 = : kolakoski6 kolakoski2

kolakoski
kolakoski = \ ds_d13p -> case ds_d13p of _ { () -> kolakoski1 }

This version contains several top-level CAFs, which are being retained for the lifetime of the program. So you really generate the list up to the 500,000,000th value and save it.

Now, what happened there? Something that was inside of your function floated outwards. Let's check the flag reference again. There's a promising flag, which is implied by -O:

-ffull-laziness Turn on full laziness (floating bindings outwards). Implied by -O.

And that's the flag that lead to your problems. Indeed, you can use ghc --make -O -fno-full-laziness Kolakoski-Unit.hs to get your original memory consumption:

$ ghc --make Kolakoski-Unit.hs -O -fno-full-laziness && ./Kolakoski-Unit +RTS -s
2
 192,001,417,688 bytes allocated in the heap
     637,962,464 bytes copied during GC
          66,104 bytes maximum residency (151 sample(s))
          43,448 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     368364 colls,     0 par    1.34s    1.32s     0.0000s    0.0002s
  Gen  1       151 colls,     0 par    0.00s    0.01s     0.0001s    0.0003s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   27.89s  ( 28.13s elapsed)
  GC      time    1.34s  (  1.33s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   29.25s  ( 29.46s elapsed)

  %GC     time       4.6%  (4.5% elapsed)

  Alloc rate    6,884,084,443 bytes per MUT second

  Productivity  95.4% of total user, 94.7% of total elapsed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why aren't the earlier terms here being garbage-collected?

Tags:

memory

compiler-optimization

garbage-collection

haskell

ghc

dspyz

People also ask

1 Answers

Related questions

2 revs

Recent Activity

Donate For Us