Word foldl' isn't optimized as well as Int foldl'

Question

import Data.List

test :: Int -> Int
test n = foldl' (+) 0 [1..n]

main :: IO ()
main = do
  print $ test $ 10^8

GHC optimizes the above code to the point that the garbage collector doesn't even have to do anything:

$ ghc -rtsopts -O2 testInt && ./testInt +RTS -s
[1 of 1] Compiling Main             ( testInt.hs, testInt.o )
Linking testInt ...
5000000050000000
          51,752 bytes allocated in the heap
           3,480 bytes copied during GC
          44,384 bytes maximum residency (1 sample(s))
          17,056 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.101s  (  0.101s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.103s  (  0.102s elapsed)

  %GC     time       0.1%  (0.1% elapsed)

  Alloc rate    511,162 bytes per MUT second

  Productivity  99.8% of total user, 100.9% of total elapsed

However, if I change the type of test to test :: Word -> Word, then a lot of garbage is produced and the code runs 40x slower.

ghc -rtsopts -O2 testWord && ./testWord +RTS -s
[1 of 1] Compiling Main             ( testWord.hs, testWord.o )
Linking testWord ...
5000000050000000
  11,200,051,784 bytes allocated in the heap
       1,055,520 bytes copied during GC
          44,384 bytes maximum residency (2 sample(s))
          21,152 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     21700 colls,     0 par    0.077s   0.073s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.000s   0.000s     0.0001s    0.0001s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    4.551s  (  4.556s elapsed)
  GC      time    0.077s  (  0.073s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    4.630s  (  4.630s elapsed)

  %GC     time       1.7%  (1.6% elapsed)

  Alloc rate    2,460,957,186 bytes per MUT second

  Productivity  98.3% of total user, 98.3% of total elapsed

Why does this happen? I expected the performance to be nearly identical? (I'm using GHC version 8.0.1 on x86_64 GNU/Linux)

edit: I submitted a bug: https://ghc.haskell.org/trac/ghc/ticket/12354#ticket

Thomas M. DuBuisson · Accepted Answer

This is probably mostly, though not exclusively, due to rewrite rules that exist for Int and not Word. I say that because if we use -fno-enable-rewrite-rules on the Int case we get a time that is much closer to, but not quite as bad as, the Word case.

% ghc -O2 so.hs -fforce-recomp -fno-enable-rewrite-rules && time ./so
[1 of 1] Compiling Main             ( so.hs, so.o )
Linking so ...
5000000050000000
./so  1.45s user 0.03s system 99% cpu 1.489 total

If we dump the rewrite rules with -ddump-rule-rewrites and diff these rules then we see a rule that fires in the Int case and not the Word case:

 Rule: fold/build
 Before: GHC.Base.foldr
 ...

That particular rule is in Base 4.9 GHC.Base line 823 (N.B. I'm actually using GHC 7.10 myself) and does not mention Int explicitly. I'm curious why it isn't firing for Word, but don't have the time right now to investigate further.

Joachim Breitner · Answer

As pointed out by dfeuer in a comment here, the Enum instance for Int is better than the one for Word:

Int:

instance  Enum Int  where
    {-# INLINE enumFromTo #-}
    enumFromTo (I# x) (I# y) = eftInt x y

{-# RULES
"eftInt"        [~1] forall x y. eftInt x y = build (\ c n -> eftIntFB c n x y)
"eftIntList"    [1] eftIntFB  (:) [] = eftInt
 #-}
{- Note [How the Enum rules work]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Phase 2: eftInt ---> build . eftIntFB
* Phase 1: inline build; eftIntFB (:) --> eftInt
* Phase 0: optionally inline eftInt
-}

{-# NOINLINE [1] eftInt #-}
eftInt :: Int# -> Int# -> [Int]
-- [x1..x2]
eftInt x0 y | isTrue# (x0 ># y) = []
            | otherwise         = go x0
               where
                 go x = I# x : if isTrue# (x ==# y)
                               then []
                               else go (x +# 1#)

{-# INLINE [0] eftIntFB #-}
eftIntFB :: (Int -> r -> r) -> r -> Int# -> Int# -> r
eftIntFB c n x0 y | isTrue# (x0 ># y) = n
                  | otherwise         = go x0
                 where
                   go x = I# x `c` if isTrue# (x ==# y)
                                   then n
                                   else go (x +# 1#)
                        -- Watch out for y=maxBound; hence ==, not >
        -- Be very careful not to have more than one "c"
        -- so that when eftInfFB is inlined we can inline
        -- whatever is bound to "c"

Now Word actually uses the implementation for Integer

enumFromTo n1 n2       = map integerToWordX [wordToIntegerX n1 .. wordToIntegerX n2]

which uses

instance  Enum Integer  where
    enumFromTo x lim       = enumDeltaToInteger x 1     lim

Now enumDeltaToInteger has rewrite rules set up, but it turns out that Word’s enumFromTo is never inlined, so this setup has no chance of fusing here.

Copying this function into my test code causes GHC to inline it, the fold/build rule to fire, and cuts down allocation severely, but the conversion from and to Integer (which does allocate) remains.

Word foldl' isn't optimized as well as Int foldl'

Tags:

optimization

haskell

ghc

Kevin

2 Answers

Thomas M. DuBuisson

Joachim Breitner

Recent Activity

Donate For Us

Word foldl' isn't optimized as well as Int foldl'

Tags:

optimization

haskell

ghc

Kevin

2 Answers

Thomas M. DuBuisson

Joachim Breitner

Related questions

Recent Activity

Donate For Us