Consider these definitions from a previous question:
type Algebra f a = f a -> a
cata :: Functor f => Algebra f b -> Fix f -> b
cata alg = alg . fmap (cata alg) . unFix
fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg = fix $ \f -> alg . fmap f . unFix
type CoAlgebra f a = a -> f a
ana :: Functor f => CoAlgebra f a -> a -> Fix f
ana coalg = Fix . fmap (ana coalg) . coalg
fixana :: Functor f => CoAlgebra f a -> a -> Fix f
fixana coalg = fix $ \f -> Fix . fmap f . coalg
I ran some benchmarks and the results are surprising me. criterion
reports something like a tenfold speedup, specifically when O2
is enabled. I wonder what causes such massive improvement, and begin to seriously doubt my benchmarking abilities.
This is the exact criterion
code I use:
smallWord, largeWord :: Word
smallWord = 2^10
largeWord = 2^20
shortEnv, longEnv :: Fix Maybe
shortEnv = ana coAlg smallWord
longEnv = ana coAlg largeWord
benchCata = nf (cata alg)
benchFixcata = nf (fixcata alg)
benchAna = nf (ana coAlg)
benchFixana = nf (fixana coAlg)
main = defaultMain
[ bgroup "cata"
[ bgroup "short input"
[ env (return shortEnv) $ \x -> bench "cata" (benchCata x)
, env (return shortEnv) $ \x -> bench "fixcata" (benchFixcata x)
]
, bgroup "long input"
[ env (return longEnv) $ \x -> bench "cata" (benchCata x)
, env (return longEnv) $ \x -> bench "fixcata" (benchFixcata x)
]
]
, bgroup "ana"
[ bgroup "small word"
[ bench "ana" $ benchAna smallWord
, bench "fixana" $ benchFixana smallWord
]
, bgroup "large word"
[ bench "ana" $ benchAna largeWord
, bench "fixana" $ benchFixana largeWord
]
]
]
And some auxiliary code:
alg :: Algebra Maybe Word
alg Nothing = 0
alg (Just x) = succ x
coAlg :: CoAlgebra Maybe Word
coAlg 0 = Nothing
coAlg x = Just (pred x)
Compiled with O0
, the digits are pretty even. With O2
, fix~
functions seem to outperform the plain ones:
benchmarking cata/short input/cata
time 31.67 μs (31.10 μs .. 32.26 μs)
0.999 R² (0.998 R² .. 1.000 R²)
mean 31.20 μs (31.05 μs .. 31.46 μs)
std dev 633.9 ns (385.3 ns .. 1.029 μs)
variance introduced by outliers: 18% (moderately inflated)
benchmarking cata/short input/fixcata
time 2.422 μs (2.407 μs .. 2.440 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.399 μs (2.388 μs .. 2.410 μs)
std dev 37.12 ns (31.44 ns .. 47.06 ns)
variance introduced by outliers: 14% (moderately inflated)
I would appreciate if someone can confirm or spot a flaw.
*I compiled things with ghc 8.2.2
on this occasion.)
postscriptum
This post from back in 2012 elaborates on the performance of fix
in quite a fine detail. (Thanks to @chi
for the link.)
This is due to how the fixed point is computed by fix
.
This was pointed out by @duplode above (and by myself in a related question). Anyway, we can summarize the issue as follows.
We have that
fix f = f (fix f)
works, but makes a fix f
new call at every recursion. Instead,
fix f = go
where go = f go
computes the same fixed point avoiding that call. In the libraries fix
is implemented in this more efficient way.
Back to the question, consider the following three implementations of cata
:
cata :: Functor f => Algebra f b -> Fix f -> b
cata alg' = alg' . fmap (cata alg') . unFix
cata2 :: Functor f => Algebra f b -> Fix f -> b
cata2 alg' = go
where
go = alg' . fmap go . unFix
fixcata :: Functor f => Algebra f b -> Fix f -> b
fixcata alg' = fix $ \f -> alg' . fmap f . unFix
The first one makes a call cata alg'
at every recursion. The second one does not. The third one also does not since the library fix
is efficient.
And indeed, we can use Criterion to confirm this, even using the same test used by the OP:
benchmarking cata/short input/cata
time 16.58 us (16.54 us .. 16.62 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 16.62 us (16.58 us .. 16.65 us)
std dev 111.6 ns (89.76 ns .. 144.0 ns)
benchmarking cata/short input/cata2
time 1.746 us (1.742 us .. 1.749 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.741 us (1.736 us .. 1.744 us)
std dev 12.69 ns (10.50 ns .. 17.31 ns)
benchmarking cata/short input/fixcata
time 2.010 us (2.003 us .. 2.016 us)
1.000 R² (1.000 R² .. 1.000 R²)
mean 2.006 us (2.001 us .. 2.011 us)
std dev 16.40 ns (14.05 ns .. 19.27 ns)
Long inputs also show the improvement.
benchmarking cata/long input/cata
time 119.3 ms (113.4 ms .. 125.8 ms)
0.996 R² (0.992 R² .. 1.000 R²)
mean 119.8 ms (117.7 ms .. 121.7 ms)
std dev 2.924 ms (2.073 ms .. 4.064 ms)
variance introduced by outliers: 11% (moderately inflated)
benchmarking cata/long input/cata2
time 17.89 ms (17.43 ms .. 18.36 ms)
0.996 R² (0.992 R² .. 0.999 R²)
mean 18.02 ms (17.49 ms .. 18.62 ms)
std dev 1.362 ms (853.9 us .. 2.022 ms)
variance introduced by outliers: 33% (moderately inflated)
benchmarking cata/long input/fixcata
time 18.03 ms (17.56 ms .. 18.50 ms)
0.996 R² (0.992 R² .. 0.999 R²)
mean 18.17 ms (17.57 ms .. 18.72 ms)
std dev 1.365 ms (852.1 us .. 2.045 ms)
variance introduced by outliers: 33% (moderately inflated)
I also experimented with ana
, observing that the performance of a similarly improved ana2
agrees with fixana
. No surprises there as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With