import Data.List
test :: Int -> Int
test n = foldl' (+) 0 [1..n]
main :: IO ()
main = do
print $ test $ 10^8
GHC optimizes the above code to the point that the garbage collector doesn't even have to do anything:
$ ghc -rtsopts -O2 testInt && ./testInt +RTS -s
[1 of 1] Compiling Main ( testInt.hs, testInt.o )
Linking testInt ...
5000000050000000
51,752 bytes allocated in the heap
3,480 bytes copied during GC
44,384 bytes maximum residency (1 sample(s))
17,056 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s
INIT time 0.000s ( 0.000s elapsed)
MUT time 0.101s ( 0.101s elapsed)
GC time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 0.103s ( 0.102s elapsed)
%GC time 0.1% (0.1% elapsed)
Alloc rate 511,162 bytes per MUT second
Productivity 99.8% of total user, 100.9% of total elapsed
However, if I change the type of test
to test :: Word -> Word
, then a lot of garbage is produced and the code runs 40x slower.
ghc -rtsopts -O2 testWord && ./testWord +RTS -s
[1 of 1] Compiling Main ( testWord.hs, testWord.o )
Linking testWord ...
5000000050000000
11,200,051,784 bytes allocated in the heap
1,055,520 bytes copied during GC
44,384 bytes maximum residency (2 sample(s))
21,152 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 21700 colls, 0 par 0.077s 0.073s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.000s 0.000s 0.0001s 0.0001s
INIT time 0.000s ( 0.000s elapsed)
MUT time 4.551s ( 4.556s elapsed)
GC time 0.077s ( 0.073s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 4.630s ( 4.630s elapsed)
%GC time 1.7% (1.6% elapsed)
Alloc rate 2,460,957,186 bytes per MUT second
Productivity 98.3% of total user, 98.3% of total elapsed
Why does this happen? I expected the performance to be nearly identical? (I'm using GHC version 8.0.1 on x86_64 GNU/Linux)
edit: I submitted a bug: https://ghc.haskell.org/trac/ghc/ticket/12354#ticket
This is probably mostly, though not exclusively, due to rewrite rules that exist for Int and not Word. I say that because if we use -fno-enable-rewrite-rules
on the Int
case we get a time that is much closer to, but not quite as bad as, the Word
case.
% ghc -O2 so.hs -fforce-recomp -fno-enable-rewrite-rules && time ./so
[1 of 1] Compiling Main ( so.hs, so.o )
Linking so ...
5000000050000000
./so 1.45s user 0.03s system 99% cpu 1.489 total
If we dump the rewrite rules with -ddump-rule-rewrites
and diff these rules then we see a rule that fires in the Int
case and not the Word
case:
Rule: fold/build
Before: GHC.Base.foldr
...
That particular rule is in Base 4.9 GHC.Base line 823 (N.B. I'm actually using GHC 7.10 myself) and does not mention Int
explicitly. I'm curious why it isn't firing for Word
, but don't have the time right now to investigate further.
As pointed out by dfeuer in a comment here, the Enum
instance for Int
is better than the one for Word
:
Int
:
instance Enum Int where
{-# INLINE enumFromTo #-}
enumFromTo (I# x) (I# y) = eftInt x y
{-# RULES
"eftInt" [~1] forall x y. eftInt x y = build (\ c n -> eftIntFB c n x y)
"eftIntList" [1] eftIntFB (:) [] = eftInt
#-}
{- Note [How the Enum rules work]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Phase 2: eftInt ---> build . eftIntFB
* Phase 1: inline build; eftIntFB (:) --> eftInt
* Phase 0: optionally inline eftInt
-}
{-# NOINLINE [1] eftInt #-}
eftInt :: Int# -> Int# -> [Int]
-- [x1..x2]
eftInt x0 y | isTrue# (x0 ># y) = []
| otherwise = go x0
where
go x = I# x : if isTrue# (x ==# y)
then []
else go (x +# 1#)
{-# INLINE [0] eftIntFB #-}
eftIntFB :: (Int -> r -> r) -> r -> Int# -> Int# -> r
eftIntFB c n x0 y | isTrue# (x0 ># y) = n
| otherwise = go x0
where
go x = I# x `c` if isTrue# (x ==# y)
then n
else go (x +# 1#)
-- Watch out for y=maxBound; hence ==, not >
-- Be very careful not to have more than one "c"
-- so that when eftInfFB is inlined we can inline
-- whatever is bound to "c"
Now Word
actually uses the implementation for Integer
enumFromTo n1 n2 = map integerToWordX [wordToIntegerX n1 .. wordToIntegerX n2]
which uses
instance Enum Integer where
enumFromTo x lim = enumDeltaToInteger x 1 lim
Now enumDeltaToInteger
has rewrite rules set up, but it turns out that Word
’s enumFromTo
is never inlined, so this setup has no chance of fusing here.
Copying this function into my test code causes GHC to inline it, the fold/build
rule to fire, and cuts down allocation severely, but the conversion from and to Integer
(which does allocate) remains.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With