I have decided to solve first programing assignment from Standford algorithm course https://class.coursera.org/algo-005 using Haskell. Despite I am very new to language I implemented it much faster than in c++. I have 6+ years of work experience in c++ so it impressed me a bit. But performance is disappointing: 0.19 sec (c++) vs 9.88 (haskell) version. How can I improve performance of Haskell implementation so it can be comparable to c++?
Here is my code in Haskell
data SortedList = SortedList {
inversionCount :: Int,
list :: [Int]
} deriving (Show)
-- first list accumulator
packm :: Int -> SortedList -> Int -> SortedList
packm x (SortedList count xs) add = SortedList (count + add) (x:xs)
merge2 :: [Int] -> [Int] -> SortedList
merge2 [] xs = SortedList 0 xs
merge2 xs [] = SortedList 0 xs
merge2 xlist@(x:xs) ylist@(y:ys)
| x < y = packm x (merge2 xs ylist) 0
| otherwise = packm y (merge2 xlist ys) $ length xlist
countAndMerge :: SortedList -> SortedList -> SortedList
countAndMerge (SortedList lcount lxs) (SortedList rcount rxs) =
let merged = merge2 lxs rxs
in SortedList (lcount + rcount + inversionCount merged) $ list merged
mergesort :: [Int] -> SortedList
mergesort [] = SortedList 0 []
mergesort [x] = SortedList 0 [x]
mergesort xs =
let leftsorted = mergesort $ take halfElements xs
rightsorted = mergesort $ drop halfElements xs
in countAndMerge leftsorted rightsorted
where halfElements = length xs `div` 2
main = do
contents <- getContents
let intlist = [ read x :: Int | x <- (lines contents) ]
print $ inversionCount $ mergesort intlist
The biggest problem is that the asymptotic performance isn't right to begin with; it's O(n^2 * log n) rather than the optimal O(n * log n). The culprit is merge2
:
| otherwise = packm y (merge2 xlist ys) $ length xlist
length xlist
is O(n). Supposing a random input list, we need to compute length xlist
on about half of the merge2
calls, thus making one level of merging O(n^2).
otherwise = packm y (merge2 xlist ys) $ length xlist
This computes length at every other step of the merge on the average. This makes the whole business quadratic.
If you track length of lists not by counting elements, but by passing the count down from the top level, you restore the O(N log N) behaviour. For a list of 100000 elements this means execution time goes down from 20 seconds to 0.45 second (on my machine with -O2).
Scaling it further up without changing the algorithm is problematic, because it currently runs in linear stack space, and cannot cope with 1 million elements with default RTS options. Change mergesort to a merge-adjacent-pairs version, it is likely to run much better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With