Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast sorting in Haskell

After reading Stack Overflow question Using vectors for performance improvement in Haskell describing a fast in-place quicksort in Haskell, I set myself two goals:

  • Implementing the same algorithm with a median of three to avoid bad performances on pre-sorted vectors;

  • Making a parallel version.

Here is the result (some minor pieces have been left for simplicity):

import qualified Data.Vector.Unboxed.Mutable as MV
import qualified Data.Vector.Generic.Mutable as GM

type Vector = MV.IOVector Int
type Sort = Vector -> IO ()

medianofthreepartition :: Vector -> Int -> IO Int
medianofthreepartition uv li = do
    p1 <- MV.unsafeRead uv li
    p2 <- MV.unsafeRead uv $ li `div` 2
    p3 <- MV.unsafeRead uv 0
    let p = median p1 p2 p3
    GM.unstablePartition (< p) uv

vquicksort :: Sort
vquicksort uv = do
    let li = MV.length uv - 1
    j <- medianofthreepartition uv li
    when (j > 1) (vquicksort (MV.unsafeSlice 0 j uv))
    when (j + 1 < li) (vquicksort (MV.unsafeSlice (j+1) (li-j) uv))

vparquicksort :: Sort
vparquicksort uv = do
    let li = MV.length uv - 1
    j <- medianofthreepartition uv li
    t1 <- tryfork (j > 1) (vparquicksort (MV.unsafeSlice 0 j uv))
    t2 <- tryfork (j + 1 < li) (vparquicksort (MV.unsafeSlice (j+1) (li-j) uv))
    wait t1
    wait t2

tryfork :: Bool -> IO () -> IO (Maybe (MVar ()))
tryfork False _ = return Nothing
tryfork True action = do
  done <- newEmptyMVar :: IO (MVar ())
  _ <- forkFinally action (\_ -> putMVar done ())
  return $ Just done

wait :: Maybe (MVar ()) -> IO ()
wait Nothing = return ()
wait (Just done) = swapMVar done ()

median :: Int -> Int -> Int -> Int
median a b c
        | a > b =
                if b > c then b
                        else if a > c then c
                                else a
        | otherwise =
                if a > c then a
                        else if b > c then c
                                else b

For vectors with 1,000,000 elements, I get the following results:

"Number of threads: 4"

"**** Parallel ****"
"Testing sort with length: 1000000"
"Creating vector"
"Printing vector"
"Sorting random vector"
CPU time:  12.30 s
"Sorting ordered vector"
CPU time:   9.44 s

"**** Single thread ****"
"Testing sort with length: 1000000"
"Creating vector"
"Printing vector"
"Sorting random vector"
CPU time:   0.27 s
"Sorting ordered vector"
CPU time:   0.39 s

My questions are:

  • Why are performances still decreasing with a pre-sorted vector?
  • Why does using forkIO and four thread fails to improve performances?
like image 881
Simon Bergot Avatar asked Jul 28 '13 20:07

Simon Bergot


People also ask

What sorting algorithm does Haskell use?

It does not use quicksort; rather, it uses an efficient implementation of an algorithm called mergesort.

How do I sort a list in Haskell?

Whenever we want to sort the elements of a given list, then we make use of the sort function in Haskell programming language, and the name of the list consisting of elements that are to be sorted is passed as a parameter to the sort function and the sort function takes a list as the input and returns the sorted list as ...

Is bubble sort stable?

Is Bubble Sort Stable? Yes, Bubble Sort is a stable sorting algorithm. We swap elements only when A is less than B. If A is equal to B, we do not swap them, hence relative order between equal elements will be maintained.


1 Answers

A better idea is to use Control.Parallel.Strategies to parallelize quicksort. With this approach you will not create expensive threads for every code that can be executed in parallel. You can also create a pure computation instead an IO.

Then you have to compile according to the number of cores you have: http://www.haskell.org/ghc/docs/latest/html/users_guide/using-concurrent.html

For an example, look at this simple quicksort on lists, written by Jim Apple:

import Data.HashTable as H
import Data.Array.IO
import Control.Parallel.Strategies
import Control.Monad
import System

exch a i r =
    do tmpi <- readArray a i
       tmpr <- readArray a r
       writeArray a i tmpr
       writeArray a i tmpi

bool a b c = if c then a else b

quicksort arr l r =
  if r <= l then return () else do
    i <- loop (l-1) r =<< readArray arr r
    exch arr i r
    withStrategy rpar $ quicksort arr l (i-1)
    quicksort arr (i+1) r
  where
    loop i j v = do
      (i', j') <- liftM2 (,) (find (>=v) (+1) (i+1)) (find (<=v) (subtract 1) (j-1))
      if (i' < j') then exch arr i' j' >> loop i' j' v
                   else return i'
    find p f i = if i == l then return i
                 else bool (return i) (find p f (f i)) . p =<< readArray arr i

main = 
    do [testSize] <- fmap (fmap read) getArgs
       arr <- testPar testSize
       ans <- readArray arr  (testSize `div` 2)
       print ans

testPar testSize =
    do x <- testArray testSize
       quicksort x 0 (testSize - 1)
       return x

testArray :: Int -> IO (IOArray Int Double)
testArray testSize = 
    do ans <- newListArray (0,testSize-1) [fromIntegral $ H.hashString $ show i | i <- [1..testSize]]
       return ans
like image 142
Boldizsár Németh Avatar answered Sep 25 '22 02:09

Boldizsár Németh