Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell : matrix sorting much slower than vector sorting

Tags:

I have to sort the lines of large integer matrices in Haskell and I started benchmarking with random data. I found that Haskell is 3 times slower than C++.

Because of the randomness, I expect line comparison to always terminate at the first column (which should have no duplicates). So I narrowed the matrix to a single column implemented as a Vector (Unboxed.Vector Int) and compared its sorting to a usual Vector Int.

Vector Int sorts as fast as C++ (good news !), but again, the column matrix is 3 times slower. Do you have an idea why ? Please find the code below.

import qualified Data.Vector.Unboxed as UV(Vector, fromList)
import qualified Data.Vector as V(Vector, fromList, modify)
import Criterion.Main(env, bench, nf, defaultMain)
import System.Random(randomIO)
import qualified Data.Vector.Algorithms.Intro as Alg(sort)

randomVector :: Int -> IO (V.Vector Int)
randomVector count = V.fromList <$> mapM (\_ -> randomIO) [1..count]

randomVVector :: Int -> IO (V.Vector (UV.Vector Int))
randomVVector count = V.fromList <$> mapM (\_ -> do
                                               x <- randomIO
                                               return $ UV.fromList [x]) [1..count]

benchSort :: IO ()
benchSort = do
  let bVVect = env (randomVVector 300000) $ bench "sortVVector" . nf (V.modify Alg.sort)
      bVect = env (randomVector 300000) $ bench "sortVector" . nf (V.modify Alg.sort)
  defaultMain [bVect, bVVect]

main = benchSort
like image 834
V. Semeria Avatar asked Aug 10 '16 19:08

V. Semeria


2 Answers

As Edward Kmett as explained to me, the Haskell version has one extra layer of indirection. A UV.Vector looks something like

data Vector a = Vector !Int !Int ByteArray#

So each entry in your vector of vectors is actually a pointer to a record holding slice indices and a pointer to an array of bytes. This is an extra indirection that the C++ code doesn't have. The solution is to use an ArrayArray#, which is an array of direct pointers to byte arrays or to further ArrayArray#s. If you need vector, you'll have to figure out what to do about the slicing machinery. Another option is to switch to primitive, which offers simpler arrays.

like image 198
dfeuer Avatar answered Sep 24 '22 16:09

dfeuer


Following dfeuer's advice, implementing a vector of vectors as an ArrayArray# is 4 times faster than Vector (Unboxed.Vector Int) and only 40% slower than sorting a c++ std::vector<std::vector<int> > :

import Control.Monad.Primitive
import Data.Primitive.ByteArray
import qualified Data.Vector.Generic.Mutable.Base as GM(MVector(..))
import GHC.Prim

data MutableArrayArray s a = MutableArrayArray (MutableArrayArray# s)

instance GM.MVector MutableArrayArray ByteArray where
  {-# INLINE basicLength #-}
  basicLength (MutableArrayArray marr) = I# (sizeofMutableArrayArray# marr)

  {-# INLINE basicUnsafeRead #-}
  basicUnsafeRead (MutableArrayArray marr) (I# i) = primitive $ \s -> case readByteArrayArray# marr i s of
    (# s1, bar #) -> (# s1, ByteArray bar #)

  {-# INLINE basicUnsafeWrite #-}
  basicUnsafeWrite (MutableArrayArray marr) (I# i) (ByteArray bar) = primitive $ \s ->
    (# writeByteArrayArray# marr i bar s, () #)

For example, sorting a matrix of integers will then use

sortIntArrays :: ByteArray -> ByteArray -> Ordering
sortIntArrays x y = let h1 = indexByteArray x 0 :: Int
                        h2 = indexByteArray y 0 :: Int in
                    compare h1 h2
like image 42
V. Semeria Avatar answered Sep 25 '22 16:09

V. Semeria