I was analyzing the effect of where
clauses on performance of Haskell programs.
In Haskell, The craft of functional programming, Thomspson, chapter 20.4, I found the following example:
exam1 :: Int -> [Int]
exam1 n = [1 .. n] ++ [1 .. n]
exam2 :: Int -> [Int]
exam2 n = list ++ list
where list = [1 .. n]
and, I quote,
The time taken to calculate [exam1] will be
O(n)
, and the space used will beO(1)
, but we will have to calculate the expression[1 .. n]
twice....
The effect [of exam2] is to compute the list
[1 .. n]
once, so that we save its value after calculating it in order to be able to use it again....
If we save something by referring to it in a
where
clause, we have to pay the penalty of the space that it occupies.
So I go wild and think that the -O2
flag must handle this and choose the best behavior for me. I analyze the time-complexity of these two examples using Criterion.
import Criterion.Main
exam1 :: Int -> [Int]
exam1 n = [1 .. n] ++ [1 .. n]
exam2 :: Int -> [Int]
exam2 n = list ++ list
where list = [1 .. n]
m :: Int
m = 1000000
main :: IO ()
main = defaultMain [ bench "exam1" $ nf exam1 m
, bench "exam2" $ nf exam2 m
]
I compile with -O2
, and find:
benchmarking exam1
time 15.11 ms (15.03 ms .. 15.16 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 15.11 ms (15.08 ms .. 15.14 ms)
std dev 83.20 μs (53.18 μs .. 122.6 μs)
benchmarking exam2
time 76.27 ms (72.84 ms .. 82.75 ms)
0.987 R² (0.963 R² .. 0.997 R²)
mean 74.79 ms (70.20 ms .. 77.70 ms)
std dev 6.204 ms (3.871 ms .. 9.233 ms)
variance introduced by outliers: 26% (moderately inflated)
What a difference! Why would that be? I thought that exam2
should be faster but memory inefficient (according to the quote above). But no, it is actually much slower (and probably more memory inefficient but that needs to be tested).
Maybe it is slower because [1 .. 1e6]
has to be stored in memory, and this takes a lot of time. What do you think?
PS: I found a possibly related question, but not really.
You can inspect GHC Core using -ddump-simpl
and observe the optimized code GHC produced. Core is not as readable as Haskell, but usually one can still get the idea of what is going on.
For exam2
we get plain boring code:
exam2
= \ (n_aX5 :: Int) ->
case n_aX5 of { GHC.Types.I# y_a1lJ ->
let {
list_s1nF [Dmd=<S,U>] :: [Int]
[LclId]
list_s1nF = GHC.Enum.eftInt 1# y_a1lJ } in
++ @ Int list_s1nF list_s1nF
}
Roughly, this defines list_s1nF
as [1..n]
(eftInt
= enum from to) and calls ++
. No inlining happened here. GHC was afraid to inline list_s1nF
since it is used twice, and inlining a definition in such case can be harmful. Indeed if let x = expensive in x+x
is inlined, expensive
might get recomputed twice, which is bad. Here GHC trusts the programmer, thinking that if they used a let / where
they want that to be computed only once. Failing to inline list_s1nF
prevents further optimization.
So this code allocates list = [1..n]
, and then copies that resulting in 1:2:...:n:list
where the tail pointer is made to point to the original list.
Copying an arbitrary list requires to follow a pointer chain and allocating cells for the new list, which is intuitively more expensive than [1..n]
which only needs to allocate the cells for the new list and keep a counter around.
Instead, exam1
is optimized much further: after some minor unboxing
exam1
= \ (w_s1os :: Int) ->
case w_s1os of { GHC.Types.I# ww1_s1ov ->
PerfList.$wexam1 ww1_s1ov
}
we get to the actual worker function.
PerfList.$wexam1
= \ (ww_s1ov :: GHC.Prim.Int#) ->
let {
n_a1lT :: [Int]
[LclId]
n_a1lT = GHC.Enum.eftInt 1# ww_s1ov } in
case GHC.Prim.># 1# ww_s1ov of {
__DEFAULT ->
letrec {
go_a1lX [Occ=LoopBreaker] :: GHC.Prim.Int# -> [Int]
[LclId, Arity=1, Str=<L,U>, Unf=OtherCon []]
go_a1lX
= \ (x_a1lY :: GHC.Prim.Int#) ->
GHC.Types.:
@ Int
(GHC.Types.I# x_a1lY)
(case GHC.Prim.==# x_a1lY ww_s1ov of {
__DEFAULT -> go_a1lX (GHC.Prim.+# x_a1lY 1#);
1# -> n_a1lT
}); } in
go_a1lX 1#;
1# -> n_a1lT
}
Here, the first "enum from to" [1..n]
was inlined, and that also triggered the inlining of ++
. The resulting recursive function go_a1lX
only relies of :
and basic arithmetics. When the recursion is over, n_a1lT
is returned which is the second "enum from to" [1..n]
. This is not inlined, since it would trigger no more optimization.
Here, no list is generated and then copied, so we get better performance.
Note that this also produces optimized code:
exam3 :: Int -> [Int]
exam3 n = list1 ++ list2
where list1 = [1 .. n]
list2 = [1 .. n]
as well as this, since GHC won't automatically cache the result of functions, so those can be inlined.
exam4 :: Int -> [Int]
exam4 n = list () ++ list ()
where list () = [1 .. n]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With