Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell; performance of where clause

I was analyzing the effect of where clauses on performance of Haskell programs.

In Haskell, The craft of functional programming, Thomspson, chapter 20.4, I found the following example:

exam1 :: Int -> [Int]
exam1 n = [1 .. n] ++ [1 .. n]

exam2 :: Int -> [Int]
exam2 n = list ++ list
  where list = [1 .. n]

and, I quote,

The time taken to calculate [exam1] will be O(n), and the space used will be O(1), but we will have to calculate the expression [1 .. n] twice.

...

The effect [of exam2] is to compute the list [1 .. n] once, so that we save its value after calculating it in order to be able to use it again.

...

If we save something by referring to it in a where clause, we have to pay the penalty of the space that it occupies.

So I go wild and think that the -O2 flag must handle this and choose the best behavior for me. I analyze the time-complexity of these two examples using Criterion.

import Criterion.Main

exam1 :: Int -> [Int]
exam1 n = [1 .. n] ++ [1 .. n]

exam2 :: Int -> [Int]
exam2 n = list ++ list
  where list = [1 .. n]

m :: Int
m = 1000000

main :: IO ()
main = defaultMain [ bench "exam1" $ nf exam1 m
                   , bench "exam2" $ nf exam2 m
                   ]

I compile with -O2, and find:

benchmarking exam1
time                 15.11 ms   (15.03 ms .. 15.16 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 15.11 ms   (15.08 ms .. 15.14 ms)
std dev              83.20 μs   (53.18 μs .. 122.6 μs)

benchmarking exam2
time                 76.27 ms   (72.84 ms .. 82.75 ms)
                     0.987 R²   (0.963 R² .. 0.997 R²)
mean                 74.79 ms   (70.20 ms .. 77.70 ms)
std dev              6.204 ms   (3.871 ms .. 9.233 ms)
variance introduced by outliers: 26% (moderately inflated)

What a difference! Why would that be? I thought that exam2 should be faster but memory inefficient (according to the quote above). But no, it is actually much slower (and probably more memory inefficient but that needs to be tested).

Maybe it is slower because [1 .. 1e6] has to be stored in memory, and this takes a lot of time. What do you think?

PS: I found a possibly related question, but not really.

like image 687
Dominik Schrempf Avatar asked Apr 29 '19 11:04

Dominik Schrempf


1 Answers

You can inspect GHC Core using -ddump-simpl and observe the optimized code GHC produced. Core is not as readable as Haskell, but usually one can still get the idea of what is going on.

For exam2 we get plain boring code:

exam2
  = \ (n_aX5 :: Int) ->
      case n_aX5 of { GHC.Types.I# y_a1lJ ->
      let {
        list_s1nF [Dmd=<S,U>] :: [Int]
        [LclId]
        list_s1nF = GHC.Enum.eftInt 1# y_a1lJ } in
      ++ @ Int list_s1nF list_s1nF
      }

Roughly, this defines list_s1nF as [1..n] (eftInt = enum from to) and calls ++. No inlining happened here. GHC was afraid to inline list_s1nF since it is used twice, and inlining a definition in such case can be harmful. Indeed if let x = expensive in x+x is inlined, expensive might get recomputed twice, which is bad. Here GHC trusts the programmer, thinking that if they used a let / where they want that to be computed only once. Failing to inline list_s1nF prevents further optimization.

So this code allocates list = [1..n], and then copies that resulting in 1:2:...:n:list where the tail pointer is made to point to the original list. Copying an arbitrary list requires to follow a pointer chain and allocating cells for the new list, which is intuitively more expensive than [1..n] which only needs to allocate the cells for the new list and keep a counter around.

Instead, exam1 is optimized much further: after some minor unboxing

exam1
  = \ (w_s1os :: Int) ->
      case w_s1os of { GHC.Types.I# ww1_s1ov ->
      PerfList.$wexam1 ww1_s1ov
      }

we get to the actual worker function.

PerfList.$wexam1
  = \ (ww_s1ov :: GHC.Prim.Int#) ->
      let {
        n_a1lT :: [Int]
        [LclId]
        n_a1lT = GHC.Enum.eftInt 1# ww_s1ov } in
      case GHC.Prim.># 1# ww_s1ov of {
        __DEFAULT ->
          letrec {
            go_a1lX [Occ=LoopBreaker] :: GHC.Prim.Int# -> [Int]
            [LclId, Arity=1, Str=<L,U>, Unf=OtherCon []]
            go_a1lX
              = \ (x_a1lY :: GHC.Prim.Int#) ->
                  GHC.Types.:
                    @ Int
                    (GHC.Types.I# x_a1lY)
                    (case GHC.Prim.==# x_a1lY ww_s1ov of {
                       __DEFAULT -> go_a1lX (GHC.Prim.+# x_a1lY 1#);
                       1# -> n_a1lT
                     }); } in
          go_a1lX 1#;
        1# -> n_a1lT
      }

Here, the first "enum from to" [1..n] was inlined, and that also triggered the inlining of ++. The resulting recursive function go_a1lX only relies of : and basic arithmetics. When the recursion is over, n_a1lT is returned which is the second "enum from to" [1..n]. This is not inlined, since it would trigger no more optimization.

Here, no list is generated and then copied, so we get better performance.

Note that this also produces optimized code:

exam3 :: Int -> [Int]
exam3 n = list1 ++ list2
  where list1 = [1 .. n]
        list2 = [1 .. n]

as well as this, since GHC won't automatically cache the result of functions, so those can be inlined.

exam4 :: Int -> [Int]
exam4 n = list () ++ list ()
  where list () = [1 .. n]
like image 70
chi Avatar answered Sep 18 '22 13:09

chi