Why do these fixpoint cata / ana morphism definitions outperform the recursive ones?

Question

Consider these definitions from a previous question:

type Algebra f a = f a -> a

cata :: Functor f => Algebra f b -> Fix f -> b
cata alg = alg . fmap (cata alg) . unFix

fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg = fix $ \f -> alg . fmap f . unFix

type CoAlgebra f a = a -> f a

ana :: Functor f => CoAlgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg

fixana :: Functor f => CoAlgebra f a -> a -> Fix f
fixana coalg = fix $ \f -> Fix . fmap f . coalg

I ran some benchmarks and the results are surprising me. criterion reports something like a tenfold speedup, specifically when O2 is enabled. I wonder what causes such massive improvement, and begin to seriously doubt my benchmarking abilities.

This is the exact criterion code I use:

smallWord, largeWord :: Word
smallWord = 2^10
largeWord = 2^20

shortEnv, longEnv :: Fix Maybe
shortEnv = ana coAlg smallWord
longEnv = ana coAlg largeWord

benchCata = nf (cata alg)
benchFixcata = nf (fixcata alg)

benchAna = nf (ana coAlg)
benchFixana = nf (fixana coAlg)

main = defaultMain
    [ bgroup "cata"
        [ bgroup "short input"
            [ env (return shortEnv) $ \x -> bench "cata"    (benchCata x)
            , env (return shortEnv) $ \x -> bench "fixcata" (benchFixcata x)
            ]
        , bgroup "long input"
            [ env (return longEnv) $ \x -> bench "cata"    (benchCata x)
            , env (return longEnv) $ \x -> bench "fixcata" (benchFixcata x)
            ]
        ]
    , bgroup "ana"
        [ bgroup "small word"
            [ bench "ana" $ benchAna smallWord
            , bench "fixana" $ benchFixana smallWord
            ]
        , bgroup "large word"
            [ bench "ana" $ benchAna largeWord
            , bench "fixana" $ benchFixana largeWord
            ]
        ]
    ]

And some auxiliary code:

alg :: Algebra Maybe Word
alg Nothing = 0
alg (Just x) = succ x

coAlg :: CoAlgebra Maybe Word
coAlg 0 = Nothing
coAlg x = Just (pred x)

Compiled with O0, the digits are pretty even. With O2, fix~ functions seem to outperform the plain ones:

benchmarking cata/short input/cata
time                 31.67 μs   (31.10 μs .. 32.26 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 31.20 μs   (31.05 μs .. 31.46 μs)
std dev              633.9 ns   (385.3 ns .. 1.029 μs)
variance introduced by outliers: 18% (moderately inflated)

benchmarking cata/short input/fixcata
time                 2.422 μs   (2.407 μs .. 2.440 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.399 μs   (2.388 μs .. 2.410 μs)
std dev              37.12 ns   (31.44 ns .. 47.06 ns)
variance introduced by outliers: 14% (moderately inflated)

I would appreciate if someone can confirm or spot a flaw.

*I compiled things with ghc 8.2.2 on this occasion.)

postscriptum

This post from back in 2012 elaborates on the performance of fix in quite a fine detail. (Thanks to @chi for the link.)

chi · Accepted Answer

This is due to how the fixed point is computed by fix. This was pointed out by @duplode above (and by myself in a related question). Anyway, we can summarize the issue as follows.

We have that

fix f = f (fix f)

works, but makes a fix f new call at every recursion. Instead,

fix f = go
   where go = f go

computes the same fixed point avoiding that call. In the libraries fix is implemented in this more efficient way.

Back to the question, consider the following three implementations of cata:

cata :: Functor f => Algebra f b -> Fix f -> b
cata alg' = alg' . fmap (cata alg') . unFix

cata2 :: Functor f => Algebra f b -> Fix f -> b
cata2 alg' = go
   where
   go = alg' . fmap go . unFix

fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg' = fix $ \f -> alg' . fmap f . unFix

The first one makes a call cata alg' at every recursion. The second one does not. The third one also does not since the library fix is efficient.

And indeed, we can use Criterion to confirm this, even using the same test used by the OP:

benchmarking cata/short input/cata
time                 16.58 us   (16.54 us .. 16.62 us)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 16.62 us   (16.58 us .. 16.65 us)
std dev              111.6 ns   (89.76 ns .. 144.0 ns)

benchmarking cata/short input/cata2
time                 1.746 us   (1.742 us .. 1.749 us)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.741 us   (1.736 us .. 1.744 us)
std dev              12.69 ns   (10.50 ns .. 17.31 ns)

benchmarking cata/short input/fixcata
time                 2.010 us   (2.003 us .. 2.016 us)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 2.006 us   (2.001 us .. 2.011 us)
std dev              16.40 ns   (14.05 ns .. 19.27 ns)

Long inputs also show the improvement.

benchmarking cata/long input/cata
time                 119.3 ms   (113.4 ms .. 125.8 ms)
                     0.996 R²   (0.992 R² .. 1.000 R²)
mean                 119.8 ms   (117.7 ms .. 121.7 ms)
std dev              2.924 ms   (2.073 ms .. 4.064 ms)
variance introduced by outliers: 11% (moderately inflated)

benchmarking cata/long input/cata2
time                 17.89 ms   (17.43 ms .. 18.36 ms)
                     0.996 R²   (0.992 R² .. 0.999 R²)
mean                 18.02 ms   (17.49 ms .. 18.62 ms)
std dev              1.362 ms   (853.9 us .. 2.022 ms)
variance introduced by outliers: 33% (moderately inflated)

benchmarking cata/long input/fixcata
time                 18.03 ms   (17.56 ms .. 18.50 ms)
                     0.996 R²   (0.992 R² .. 0.999 R²)
mean                 18.17 ms   (17.57 ms .. 18.72 ms)
std dev              1.365 ms   (852.1 us .. 2.045 ms)
variance introduced by outliers: 33% (moderately inflated)

I also experimented with ana, observing that the performance of a similarly improved ana2 agrees with fixana. No surprises there as well.

Why do these fixpoint cata / ana morphism definitions outperform the recursive ones?

Tags:

performance

optimization

benchmarking

recursion

haskell

Ignat Insarov

1 Answers

chi

Recent Activity

Donate For Us

Why do these fixpoint cata / ana morphism definitions outperform the recursive ones?

Tags:

performance

optimization

benchmarking

recursion

haskell

Ignat Insarov

1 Answers

chi

Related questions

Recent Activity

Donate For Us