Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

haskell matrix implemetation performance

I heard a lot about amazing performance of programs written in Haskell, and wanted to make some tests. So, I wrote a 'library' for matrix operations just to compare it's performance with the same stuff written in pure C. First of all I tested 500000 matrices multiplication performance, and noticed that it was... never-ending (i. e. ending with out of memory exception after 10 minutes of so)! After studying haskell a bit more I managed to get rid of laziness and the best result I managed to get is ~20 times slower than its equivalent in C. So, the question: could you review the code below and tell if its performance can be improved a bit more? 20 times is still disappointing me a bit.

import Prelude hiding (foldr, foldl, product)
import Data.Monoid
import Data.Foldable

import Text.Printf
import System.CPUTime

import System.Environment

data Vector a = Vec3 a a a
              | Vec4 a a a a
                deriving Show

instance Foldable Vector where
  foldMap f (Vec3 a b c) = f a `mappend` f b `mappend` f c
  foldMap f (Vec4 a b c d) = f a `mappend` f b `mappend` f c `mappend` f d

data Matr a = Matr !a !a !a !a
                   !a !a !a !a
                   !a !a !a !a
                   !a !a !a !a

instance Show a => Show (Matr a) where
  show m = foldr f [] $ matrRows m
            where f a b = show a ++ "\n" ++ b

matrCols (Matr a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3)
              = [Vec4 a0 a1 a2 a3, Vec4 b0 b1 b2 b3, Vec4 c0 c1 c2 c3, Vec4 d0 d1 d2 d3]

matrRows (Matr a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3)
              = [Vec4 a0 b0 c0 d0, Vec4 a1 b1 c1 d1, Vec4 a2 b2 c2 d2, Vec4 a3 b3 c3 d3]

matrFromList [a0, b0, c0, d0, a1, b1, c1, d1, a2, b2, c2, d2, a3, b3, c3, d3]
        = Matr a0 b0 c0 d0
               a1 b1 c1 d1
               a2 b2 c2 d2
               a3 b3 c3 d3

matrId :: Matr Double
matrId  = Matr 1 0 0 0
               0 1 0 0
               0 0 1 0
               0 0 0 1

normalise (Vec4 x y z w) = Vec4 (x/w) (y/w) (z/w) 1

mult a b = matrFromList [f r c | r <- matrRows a, c <- matrCols b] where 
            f a b = foldr (+) 0 $ zipWith (*) (toList a) (toList b)
like image 532
MaksymB Avatar asked Jul 20 '11 06:07

MaksymB


2 Answers

First, I doubt that you'll ever get stellar performance with this implementation. There are too many conversions between different representations. You'd be better off basing your code on something like the vector package. Also you don't provide all your testing code, so there are probably other issues that we can't here. This is because the pipeline of production to consumption has a big impact on Haskell performance, and you haven't provided either end.

Now, two specific problems:

1) Your vector is defined as either a 3 or 4 element vector. This means that for every vector there's an extra check to see how many elements are in use. In C, I imagine your implementation is probably closer to

struct vec {
  double *vec;
  int length;
}

You should do something similar in Haskell; this is how vector and bytestring are implemented for example.

Even if you don't change the Vector definition, make the fields strict. You should also either add UNPACK pragmas (to Vector and Matrix) or compile with -funbox-strict-fields.

2) Change mult to

mult a b = matrFromList [f r c | r <- matrRows a, c <- matrCols b] where 
            f a b = Data.List.foldl' (+) 0 $ zipWith (*) (toList a) (toList b)

The extra strictness of foldl' will give much better performance in this case than foldr.

This change alone might make a big difference, but without seeing the rest of your code it's difficult to say.

like image 178
John L Avatar answered Nov 12 '22 18:11

John L


Answering my own question just to share new results I got yesterday:

  1. I upgraded ghc to the most recent version and performance became indeed not that bad (only ~7 times worse).

  2. Also I tried implementing the matrix in a stupid and simple way (see the listing below) and got really acceptable performance - only about 2 times slower than C equivalent.

    data Matr a = Matr ( a, a, a, a
                       , a, a, a, a
                       , a, a, a, a
                       , a, a, a, a) 
    
    mult (Matr (!a0,  !b0,  !c0,  !d0,  
                !a1,  !b1,  !c1,  !d1,  
                !a2,  !b2,  !c2,  !d2,  
                !a3,  !b3,  !c3,  !d3))
         (Matr (!a0', !b0', !c0', !d0', 
                !a1', !b1', !c1', !d1', 
                !a2', !b2', !c2', !d2', 
                !a3', !b3', !c3', !d3'))
         = Matr ( a0'', b0'', c0'', d0''
                , a1'', b1'', c1'', d1''
                , a2'', b2'', c2'', d2''
                , a3'', b3'', c3'', d3'')
            where a0'' = a0 * a0' + b0 * a1' + c0 * a2' + d0 * a3'
                  b0'' = a0 * b0' + b0 * b1' + c0 * b2' + d0 * b3'
                  c0'' = a0 * c0' + b0 * c1' + c0 * c2' + d0 * c3'
                  d0'' = a0 * d0' + b0 * d1' + c0 * d2' + d0 * d3'
    
                  a1'' = a1 * a0' + b1 * a1' + c1 * a2' + d1 * a3'
                  b1'' = a1 * b0' + b1 * b1' + c1 * b2' + d1 * b3'
                  c1'' = a1 * c0' + b1 * c1' + c1 * c2' + d1 * c3'
                  d1'' = a1 * d0' + b1 * d1' + c1 * d2' + d1 * d3'
    
                  a2'' = a2 * a0' + b2 * a1' + c2 * a2' + d2 * a3'
                  b2'' = a2 * b0' + b2 * b1' + c2 * b2' + d2 * b3'
                  c2'' = a2 * c0' + b2 * c1' + c2 * c2' + d2 * c3'
                  d2'' = a2 * d0' + b2 * d1' + c2 * d2' + d2 * d3'
    
                  a3'' = a3 * a0' + b3 * a1' + c3 * a2' + d3 * a3'
                  b3'' = a3 * b0' + b3 * b1' + c3 * b2' + d3 * b3'
                  c3'' = a3 * c0' + b3 * c1' + c3 * c2' + d3 * c3'
                  d3'' = a3 * d0' + b3 * d1' + c3 * d2' + d3 * d3'
    
like image 26
MaksymB Avatar answered Nov 12 '22 19:11

MaksymB