I heard a lot about amazing performance of programs written in Haskell, and wanted to make some tests. So, I wrote a 'library' for matrix operations just to compare it's performance with the same stuff written in pure C. First of all I tested 500000 matrices multiplication performance, and noticed that it was... never-ending (i. e. ending with out of memory exception after 10 minutes of so)! After studying haskell a bit more I managed to get rid of laziness and the best result I managed to get is ~20 times slower than its equivalent in C. So, the question: could you review the code below and tell if its performance can be improved a bit more? 20 times is still disappointing me a bit.
import Prelude hiding (foldr, foldl, product)
import Data.Monoid
import Data.Foldable
import Text.Printf
import System.CPUTime
import System.Environment
data Vector a = Vec3 a a a
| Vec4 a a a a
deriving Show
instance Foldable Vector where
foldMap f (Vec3 a b c) = f a `mappend` f b `mappend` f c
foldMap f (Vec4 a b c d) = f a `mappend` f b `mappend` f c `mappend` f d
data Matr a = Matr !a !a !a !a
!a !a !a !a
!a !a !a !a
!a !a !a !a
instance Show a => Show (Matr a) where
show m = foldr f [] $ matrRows m
where f a b = show a ++ "\n" ++ b
matrCols (Matr a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3)
= [Vec4 a0 a1 a2 a3, Vec4 b0 b1 b2 b3, Vec4 c0 c1 c2 c3, Vec4 d0 d1 d2 d3]
matrRows (Matr a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3)
= [Vec4 a0 b0 c0 d0, Vec4 a1 b1 c1 d1, Vec4 a2 b2 c2 d2, Vec4 a3 b3 c3 d3]
matrFromList [a0, b0, c0, d0, a1, b1, c1, d1, a2, b2, c2, d2, a3, b3, c3, d3]
= Matr a0 b0 c0 d0
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
matrId :: Matr Double
matrId = Matr 1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
normalise (Vec4 x y z w) = Vec4 (x/w) (y/w) (z/w) 1
mult a b = matrFromList [f r c | r <- matrRows a, c <- matrCols b] where
f a b = foldr (+) 0 $ zipWith (*) (toList a) (toList b)
First, I doubt that you'll ever get stellar performance with this implementation. There are too many conversions between different representations. You'd be better off basing your code on something like the vector package. Also you don't provide all your testing code, so there are probably other issues that we can't here. This is because the pipeline of production to consumption has a big impact on Haskell performance, and you haven't provided either end.
Now, two specific problems:
1) Your vector is defined as either a 3 or 4 element vector. This means that for every vector there's an extra check to see how many elements are in use. In C, I imagine your implementation is probably closer to
struct vec {
double *vec;
int length;
}
You should do something similar in Haskell; this is how vector
and bytestring
are implemented for example.
Even if you don't change the Vector
definition, make the fields strict. You should also either add UNPACK
pragmas (to Vector and Matrix) or compile with -funbox-strict-fields
.
2) Change mult
to
mult a b = matrFromList [f r c | r <- matrRows a, c <- matrCols b] where
f a b = Data.List.foldl' (+) 0 $ zipWith (*) (toList a) (toList b)
The extra strictness of foldl'
will give much better performance in this case than foldr
.
This change alone might make a big difference, but without seeing the rest of your code it's difficult to say.
Answering my own question just to share new results I got yesterday:
I upgraded ghc to the most recent version and performance became indeed not that bad (only ~7 times worse).
Also I tried implementing the matrix in a stupid and simple way (see the listing below) and got really acceptable performance - only about 2 times slower than C equivalent.
data Matr a = Matr ( a, a, a, a
, a, a, a, a
, a, a, a, a
, a, a, a, a)
mult (Matr (!a0, !b0, !c0, !d0,
!a1, !b1, !c1, !d1,
!a2, !b2, !c2, !d2,
!a3, !b3, !c3, !d3))
(Matr (!a0', !b0', !c0', !d0',
!a1', !b1', !c1', !d1',
!a2', !b2', !c2', !d2',
!a3', !b3', !c3', !d3'))
= Matr ( a0'', b0'', c0'', d0''
, a1'', b1'', c1'', d1''
, a2'', b2'', c2'', d2''
, a3'', b3'', c3'', d3'')
where a0'' = a0 * a0' + b0 * a1' + c0 * a2' + d0 * a3'
b0'' = a0 * b0' + b0 * b1' + c0 * b2' + d0 * b3'
c0'' = a0 * c0' + b0 * c1' + c0 * c2' + d0 * c3'
d0'' = a0 * d0' + b0 * d1' + c0 * d2' + d0 * d3'
a1'' = a1 * a0' + b1 * a1' + c1 * a2' + d1 * a3'
b1'' = a1 * b0' + b1 * b1' + c1 * b2' + d1 * b3'
c1'' = a1 * c0' + b1 * c1' + c1 * c2' + d1 * c3'
d1'' = a1 * d0' + b1 * d1' + c1 * d2' + d1 * d3'
a2'' = a2 * a0' + b2 * a1' + c2 * a2' + d2 * a3'
b2'' = a2 * b0' + b2 * b1' + c2 * b2' + d2 * b3'
c2'' = a2 * c0' + b2 * c1' + c2 * c2' + d2 * c3'
d2'' = a2 * d0' + b2 * d1' + c2 * d2' + d2 * d3'
a3'' = a3 * a0' + b3 * a1' + c3 * a2' + d3 * a3'
b3'' = a3 * b0' + b3 * b1' + c3 * b2' + d3 * b3'
c3'' = a3 * c0' + b3 * c1' + c3 * c2' + d3 * c3'
d3'' = a3 * d0' + b3 * d1' + c3 * d2' + d3 * d3'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With