Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this Haskell code so much slower than the C equivalent? Unboxed vectors and bangs already used

I wrote a short Mandelbrot Set generator in Haskell and in C and discovered that the C version ran 20x faster than the Haskell version. While I expected Haskell to be slower, I did not expect more than an order of magnitude considering I am already using unboxed vectors and bangs to avoid excessive thunking.

Profiling reveals that most of the time is spend in the the go function of the following code, which is really just a loop with some comparisons, multiplications and additions.

orbits limit radius a b = go 0 0 0
  where
    r2 = radius * radius
    go !n !x !y
      | n == limit    = n
      | x2 + y2 >= r2 = n
      | otherwise     = go (n + 1) (x2 - y2 + a) (2 * x * y + b)
      where
        x2 = x * x
        y2 = y * y

In execution, it takes the C code 0.9 seconds to run and it takes the equivalent Haskell code 18 seconds. They both implement the same algorithm and they both generate the same output (PGM graphics file).

The Haskell source code is here:

  • Main.hs
  • MandelV.hs
  • mandel3.prof (profiling report)

The C code is here:

  • mandel.c

What could be the problem that causes this huge difference in performance?

like image 871
rityzmon Avatar asked Jun 20 '16 20:06

rityzmon


1 Answers

tl;dr - add type signature, use ByteString and turn on -O3

But first of all - as others have said before - you are not comparing the same things, while your c code uses a lot of mutability and the weak type system of c. And I believe also the writing to a file is more unsafe than the haskell equivalent. You have the benefit of type-checking/type-inference of haskell.

Also note that without any type signature your code is polymorphic - i.e. you could use the same code with Float or Double, Word8 or Int if you desired to do so. Here lies the first trap - for integral numbers GHC defaults to Integer, an arbitrary precision integral number, equivalent to "bigint", which is usually slower by orders of magnitude.

Therefore adding a type signature, increases the speed tremendously.

(for exercise and learning) I did some comparison and implementation using unboxed types (ub-mandel), a typed version (mandel) and the op's untyped version (ut-mandel), as well as the c version (c-mandel).

measuring these programs you get the following (on a modern laptop using linux)

★ time ./c-mandel
./c-mandel  0,46s user 0,00s system 99% cpu 0,467 total

★ time stack exec -- ut-mandel
stack exec -- ut-mandel  9,33s user 0,09s system 99% cpu 9,432 total

★ time stack exec -- mandel
stack exec -- mandel  1,70s user 0,04s system 99% cpu 1,749 total

★ time stack exec -- ub-mandel
stack exec -- ub-mandel  1,25s user 0,08s system 98% cpu 1,343 total

Obviously the c code beats all implementations - but just adding the type signature brings a speedup by a factor of 5.49. Though doing the migration to unboxed types (which I have to admit was a first time) brings another 36 % speedup (note: this speedup is at the cost of the readability of the code).

But still this can be improved switching from the String version to ByteString gets us even further.

★ time stack exec -- ub-mandel-bytestring
stack exec -- ub-mandel-bytestring 0,84s user 0,04s system 98% cpu 0,890 total

Lessons learned

  • turn on type signatures
  • turn on -O3
  • use Bytestring
  • if your code is still not fast enough - invest an hour and move to unboxed types
  • if you still have energy, go read on reading llvm output and what the compiler does, a starting point would be neil mitchell's blog article

Note: all these calculations were done without the use of parallel computations, which I think could be done a lot more easily in haskell than in C. But this is a task I leave to someone else, or take a look at gh: simonmar/parconc-examples for a version that runs parallel on the gpu.

For completeness' sake the unboxed, bytestring version:

Main.hs

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE MagicHash #-}

module Main where

import Control.Monad
import Data.ByteString.Char8 as C
import System.IO (withFile, IOMode(WriteMode), Handle)

import           GHC.Prim
import           GHC.Exts (Int(..), Double(..))
import qualified Data.Vector.Unboxed as U
import qualified MandelV as MV

savePgm :: Int -> Int -> Int -> U.Vector Int -> String -> IO ()
savePgm w h orbits v filename =
  withFile filename WriteMode $ \f -> do
    hPutStrLn f "P2"
    hPutStrLn f $ C.pack $ show w ++ " " ++ show h
    hPutStrLn f (C.pack $ show orbits)
    U.imapM_ (elm f) v
  where
      elm :: Handle -> Int -> Int -> IO ()
      elm f ix e =
          if rem ix w == 0
             then hPutStrLn f $ C.pack $ show e
             else hPutStr f $ C.pack $ show e ++ " "

main :: IO ()
main = do
  let w        = 2560#  :: Int#
      h        = 1600#  :: Int#
      x1       = -2.0## :: Double#
      y1       = -1.5## :: Double#
      x2       = 1.0##  :: Double#
      y2       = 1.5##  :: Double#
      filename = "test_hs.pgm"
      orbits   = 63#    :: Int#
      radius   = 2.0##  :: Double#

      v = MV.mandelbrot orbits radius x1 y1 x2 y2 w h :: U.Vector Int
  savePgm (I# w) (I# h) (I# orbits) v filename

MandelV.hs

{-# LANGUAGE MagicHash #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE UnboxedTuples #-}

module MandelV where

import GHC.Prim
import GHC.Exts
import qualified Data.Vector.Unboxed as U

orbits :: Int# -> Double# -> Double# -> Double# -> Int#
orbits limit radius a b =
    go 0# 0.0## 0.0##
  where
    r2 = radius *## radius
    go  ::  Int# -> Double# -> Double# -> Int#
    go !n !x !y
      | unsafeCoerce# (n ==# limit)       = n
      | unsafeCoerce# (x2 +## y2 >=## r2) = n
      | otherwise                         = go (n +# 1#) (x2 -## y2 +## a) (2.0## *## x *## y +## b)
      where
        x2 = x *## x
        y2 = y *## y

mandelbrot :: Int# -> Double# -> Double# -> Double# -> Double# -> Double# -> Int# -> Int# -> U.Vector Int
mandelbrot limit radius x1 y1 x2 y2 w h = U.generate (I# (w *# h)) f
  where
    mx = (x2 -## x1) /## int2Double# (w -# 1#)
    my = (y2 -## y1) /## int2Double# (h -# 1#)
    f :: Int -> Int
    f (I# ix) = I# (orbits limit radius x y)
      where (# j,i #) = quotRemInt# ix w
            x          = mx *## (x1 +## int2Double# i)
            y          = my *## (y1 +## int2Double# j)

relevant portions of

mandel.cabal

executable ub-mandel
  main-is:             Main.hs
  other-modules:       MandelV
  -- other-extensions:    
  build-depends:       base >=4.8 && <4.9
               ,       vector >=0.11 && <0.12
               ,       ghc-prim
               ,       bytestring
  hs-source-dirs:      unboxed
  default-language:    Haskell2010
  ghc-options:         -O3
like image 193
epsilonhalbe Avatar answered Nov 06 '22 17:11

epsilonhalbe