Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my Haskell selection sort implementation extremely fast?

I implemented selection sort and compared it to Data.List's sort. It is orders of magnitudes faster than Data.List's sort. If I apply it to 10,000 randomly generated numbers the results are as follows:

 ✓ in 1.22µs: Selection sort
 ✓ in 9.84ms: Merge sort (Data.List)

This can't be right. First I thought maybe merge sort's intermediate results are cached and selection sort uses those to be much faster. Even when I comment out merge sort and only time selection sort, it is this fast however. I also verified the output and it is correctly sorted.

What causes this behaviour?

I use this code to test:

{-# LANGUAGE BangPatterns #-}

module Lib
    ( testSortingAlgorithms
    ) where

import System.Random (randomRIO)
import Text.Printf
import Control.Exception
import System.CPUTime
import Data.List (sort, sortOn)


selectionSort :: Ord a => [a] -> [a]
selectionSort [] = []
selectionSort nrs = 
  let (smallest, rest) = getSmallest nrs
  in smallest : selectionSort rest
  where getSmallest :: Ord a => [a] -> (a, [a])
        getSmallest [a] = (a, [])
        getSmallest (a:as) = let (smallest, rest) = getSmallest as
                             in if smallest > a then (a, smallest : rest)
                                                else (smallest, a : rest)


main :: IO ()
main = testSortingAlgorithms


testSortingAlgorithms :: IO ()
testSortingAlgorithms = do
    !list' <- list (10000)
    results <- mapM (timeIt list') sorts
    let results' = sortOn fst results
    mapM_ (\(diff, msg) -> printf (msg) (diff::Double)) results'
    return ()


sorts :: Ord a => [(String, [a] -> [a])]
sorts = [
        ("Selection sort", selectionSort)
      , ("Merge sort (Data.List)", sort)
    ]


list :: Int -> IO [Int]
list n = sequence $ replicate n $ randomRIO (-127,127::Int)


timeIt :: (Ord a, Show a) 
       => [a] -> (String, [a] -> [a]) -> IO (Double, [Char])
timeIt vals (name, sorter) = do
    start <- getCPUTime
    --v <- sorter vals `seq` return ()
    let !v = sorter vals
    --putStrLn $ show v
    end   <- getCPUTime
    let (diff, ext) = unit $ (fromIntegral (end - start)) / (10^3)
    let msg = if correct v 
              then (" ✓ in %0.2f" ++ ext ++ ": " ++ name ++ "\n") 
              else (" ✗ in %0.2f" ++ ext ++ ": " ++ name ++ "\n") 
    return (diff, msg)


correct :: (Ord a) => [a] -> Bool
correct [] = True
correct (a:[]) = True
correct (a1:a2:as) = a1 <= a2 && correct (a2:as)


unit :: Double -> (Double, String)
unit v | v < 10^3 = (v, "ns")
       | v < 10^6 = (v / 10^3, "µs")
       | v < 10^9 = (v / 10^6, "ms")
       | otherwise = (v / 10^9, "s")
like image 835
Arthur Avatar asked Dec 17 '22 14:12

Arthur


1 Answers

You write

let !v = sorter vals

which is "strict", but only to WHNF. So you are only timing how long it takes to find the smallest element of the list, not how long it takes to sort the whole thing. Selection sort starts by doing exactly that, so it is "optimal" for this incorrect benchmark, while mergesort does a bunch more work that's "wasted" if you only look at the first element.

like image 124
amalloy Avatar answered Jan 11 '23 22:01

amalloy