Why is there a 1000x performance difference between two versions of merge sort in haskell

Question

EDIT:

It turns out the slow version is actually an insertion sort O(n^2) rather than a merge sort O(n log n) explaining the performance issue. I thought I'd save future readers the pain of wading through the code to discover this answer.

ORIGINAL BEGINS HERE -------------------------------------------

I've written two versions of merge sort in haskell and I can't grok why the one is 1000 times faster than the other. In both cases we start by making item in the list to be sorted a list creating a list of list. Then we pair up lists and merge them until only one list remains. The problem seems to be that I'm calling "doMerge (x1:x2:xs) = doMerge $ merge x1 x2 : doMerge xs" in the slow version but calling doMerge (mergePairs xs) in the fast version. I'm surprised by the 1000x speed difference!

-- Better version: takes 0.34 seconds to sort a 100,000 integer list.
betMergeSort :: [Int] -> [Int]
betMergeSort list = doMerge $ map (\x -> [x]) list
  where
    doMerge :: [[Int]] -> [Int]
    doMerge [] = []
    doMerge [xs] = xs
    doMerge xs = doMerge (mergePairs xs)

    mergePairs :: [[Int]] -> [[Int]]
    mergePairs (x1:x2:xs) = merge x1 x2 : mergePairs xs
    mergePairs xs = xs

    -- expects two sorted lists and returns one sorted list.
    merge :: [Int] -> [Int] -> [Int]
    merge [] ys = ys
    merge xs [] = xs
    merge (x:xs) (y:ys) = if x <= y
                            then x : merge xs (y:ys)
                            else y : merge (x:xs) ys


-- Slow version: takes 350 seconds to sort a 100,000 integer list.
slowMergeSort :: [Int] -> [Int]
slowMergeSort list = head $ doMerge $ map (\x -> [x]) list
  where
    doMerge :: [[Int]] -> [[Int]]
    doMerge [] = []
    doMerge (oneList:[]) = [oneList]
    doMerge (x1:x2:xs) = doMerge $ merge x1 x2 : doMerge xs

    -- expects two sorted list and returns one sorted list.
    merge :: [Int] -> [Int] -> [Int]
    merge [] ys = ys
    merge xs [] = xs
    merge (x:xs) (y:ys) = if x <= y then x : merge xs (y:ys) else y : merge (x:xs) ys

Looking at the profiler output, it is clear that the slow version is allocating way more memory. I'm not sure why though. Both versions seem similar in my head. Can someone explain why the allocation is so different?

slowMergeSort Profiling result:

    Wed Aug 21 12:24 2013 Time and Allocation Profiling Report  (Final)

       mergeSort +RTS -sstderr -p -RTS s

    total time  =       12.02 secs   (12017 ticks @ 1000 us, 1 processor)
    total alloc = 17,222,571,672 bytes  (excludes profiling overheads)

COST CENTRE         MODULE  %time %alloc

slowMergeSort.merge Main     99.2   99.4


                                                                               individual     inherited
COST CENTRE               MODULE                             no.     entries  %time %alloc   %time %alloc

MAIN                      MAIN                                74           0    0.0    0.0   100.0  100.0
 main                     Main                               149           0    0.0    0.0   100.0  100.0
  main.sortedL            Main                               165           1    0.0    0.0    99.3   99.5
   slowMergeSort          Main                               167           1    0.0    0.0    99.3   99.5
    slowMergeSort.\       Main                               170       40000    0.0    0.0     0.0    0.0
    slowMergeSort.doMerge Main                               168       79999    0.0    0.0    99.2   99.5
     slowMergeSort.merge  Main                               169   267588870   99.2   99.4    99.2   99.4
  main.sortVersion        Main                               161           1    0.0    0.0     0.0    0.0
  randomInts              Main                               151           1    0.0    0.0     0.7    0.5
   force                  Main                               155           1    0.0    0.0     0.0    0.0
    force.go              Main                               156       40001    0.0    0.0     0.0    0.0
   randomInts.result      Main                               152           1    0.7    0.5     0.7    0.5

libMergeSort Profiling

    Wed Aug 21 12:23 2013 Time and Allocation Profiling Report  (Final)

       mergeSort +RTS -sstderr -p -RTS l

    total time  =        0.12 secs   (124 ticks @ 1000 us, 1 processor)
    total alloc = 139,965,768 bytes  (excludes profiling overheads)

COST CENTRE              MODULE  %time %alloc

randomInts.result        Main     66.9   64.0
libMergeSort.merge       Main     24.2   30.4
main                     Main      4.0    0.0
libMergeSort             Main      2.4    3.2
libMergeSort.merge_pairs Main      1.6    2.3


                                                                                    individual     inherited
COST CENTRE                    MODULE                             no.     entries  %time %alloc   %time %alloc

MAIN                           MAIN                                74           0    0.0    0.0   100.0  100.0
 main                          Main                               149           0    4.0    0.0   100.0  100.0
  main.sortedL                 Main                               165           1    0.0    0.0    28.2   35.9
   libMergeSort                Main                               167           1    2.4    3.2    28.2   35.9
    libMergeSort.\             Main                               171       40000    0.0    0.0     0.0    0.0
    libMergeSort.libMergeSort' Main                               168          17    0.0    0.0    25.8   32.7
     libMergeSort.merge_pairs  Main                               169       40015    1.6    2.3    25.8   32.7
      libMergeSort.merge       Main                               170      614711   24.2   30.4    24.2   30.4
  main.sortVersion             Main                               161           1    0.0    0.0     0.0    0.0
  randomInts                   Main                               151           1    0.0    0.0    67.7   64.0
   force                       Main                               155           1    0.0    0.0     0.8    0.0
    force.go                   Main                               156       40001    0.8    0.0     0.8    0.0
   randomInts.result           Main                               152           1   66.9   64.0    66.9   64.0

Philip JF · Accepted Answer

The second of these is O(n^2) and should not be used as it is the wrong algorithm (shouldn't be called mergesort).

doMerge (x1:x2:xs) = doMerge $ merge x1 x2 : doMerge xs

completly sorts xs which is only a constant shorter than the original list. Merging a very short list with a very long one is equivalent to insertion, so we see that this is really just insertion sort. The entire point of mergesort is divide and conquer, this is not divide and conquer. As the length of the lists grow greater the speed ratio will continue to get worse.

The first is a proper merge sort as it merges lists of about even length.

Why is there a 1000x performance difference between two versions of merge sort in haskell

Tags:

performance

haskell

Tim Perry

1 Answers

Philip JF

Recent Activity

Donate For Us

Why is there a 1000x performance difference between two versions of merge sort in haskell

Tags:

performance

haskell

Tim Perry

1 Answers

Philip JF

Related questions

Recent Activity

Donate For Us