Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is writeSTRef faster than if expression?

Tags:

haskell

writeSTRef twice for each iteration

fib3 :: Int -> Integer
fib3 n = runST $ do
    a <- newSTRef 1
    b <- newSTRef 1
    replicateM_ (n-1) $ do
        !a' <- readSTRef a
        !b' <- readSTRef b
        writeSTRef a b'
        writeSTRef b $! a'+b'
    readSTRef b

writeSTRef once for each iteration

fib4 :: Int -> Integer
fib4 n = runST $ do
    a <- newSTRef 1
    b <- newSTRef 1
    replicateM_ (n-1) $ do
        !a' <- readSTRef a
        !b' <- readSTRef b
        if a' > b'
          then writeSTRef b $! a'+b'
          else writeSTRef a $! a'+b'
    a'' <- readSTRef a
    b'' <- readSTRef b
    if a'' > b''
      then return a''
      else return b''

Benchmark, given n = 20000:

benchmarking 20000/fib3 mean: 5.073608 ms, lb 5.071842 ms, ub 5.075466 ms, ci 0.950 std dev: 9.284321 us, lb 8.119454 us, ub 10.78107 us, ci 0.950

benchmarking 20000/fib4 mean: 5.384010 ms, lb 5.381876 ms, ub 5.386099 ms, ci 0.950 std dev: 10.85245 us, lb 9.510215 us, ub 12.65554 us, ci 0.950

fib3 is a bit faster than fib4.

like image 270
wenlong Avatar asked Mar 27 '12 05:03

wenlong


1 Answers

I think you already got some answers from #haskell; basically, each writeSTRef boils down to one or two writes to memory, which is cheap in this instance since they probably never even get past the level 1 cache.

The branch resulting from the if-then-else in fib3 on the other hand creates two paths that are taken alternately on successive iterations, which is a bad case for many CPU branch predictors, adding bubbles to the pipeline. See http://en.wikipedia.org/wiki/Instruction_pipeline.

How about the pure version?

fib0 :: Int -> Integer
fib0 = go 0 1 where
    go :: Integer -> Integer -> Int -> Integer
    go a b n = case n > 0 of
        True -> go b (a + b) (n - 1)
        False -> b

It's even faster:

benchmarking fib0 40000
mean: 17.14679 ms, lb 17.12902 ms, ub 17.16739 ms, ci 0.950
std dev: 97.28594 us, lb 82.39644 us, ub 120.1041 us, ci 0.950

benchmarking fib3 40000
mean: 17.32658 ms, lb 17.30739 ms, ub 17.34931 ms, ci 0.950
std dev: 106.7610 us, lb 89.69371 us, ub 126.8279 us, ci 0.950

benchmarking fib4 40000
mean: 18.13887 ms, lb 18.11173 ms, ub 18.16868 ms, ci 0.950
std dev: 145.9772 us, lb 127.6892 us, ub 168.3347 us, ci 0.950
like image 71
liyang Avatar answered Oct 16 '22 20:10

liyang