Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is this implementation of merge sort good?

I've just started to learn Haskell last night and I've never used a functional programming language before. I just want to know if my implemention of merge sort is good or bad and what exactly is good or bad. Maybe it's even wrong - Well it does sort but maybe the Algorithm is not what I think what merge sort is.

Just tell me everything I could improve here. I by myself think its a pretty clear and simple implementation. Thanks for your advice, here's the code :)

merge [] ys = ys
merge xs [] = xs
merge xs ys =  sorted : merge left right
                where 
                    sorted = if head(xs) < head(ys) then head(xs) else head(ys)
                    left = if head(xs) <= head(ys) then tail(xs) else xs
                    right = if head(xs) > head(ys) then tail(ys) else ys

msort [] = []
msort [x] = [x]
msort xs = merge (msort left) (msort right)
            where 
                left = take (div (length xs) 2) xs
                right = drop (div (length xs) 2) xs
like image 447
Nocta Avatar asked Nov 19 '12 14:11

Nocta


1 Answers

Well, first of all, we can rewrite merge to be a little more elegant using pattern matching

merge [] ys = ys
merge xs [] = xs
merge xs@(x:xs1) ys@(y:ys1)
    | x <= y = x : merge xs1 ys
    | otherwise = y : merge xs ys1

In general you should avoid using head and tail since they are a bit unsafe (they raise an error for the empty list) and use pattern matching whenever possible.

The implementation of msort is pretty much spot on, except that we can split the list in a more efficient way. That's because length xs - takes O(N) to complete. The compiler might save you and cache the result of the length call so that the second call to length won't traverse the list again. But the take and drop will pretty much cause another two traversals thus splitting the list using 3 traversals which may prove to be expensive. We can do better by splitting the list in two lists - the first one containing the elements on the odd positions and the second list with the elements placed on the even positions, like so:

msort [] = []
msort [x] = [x]
msort xs = merge (msort first) (msort second)
    where
        (first, second) = splitInHalves xs
        splitInHalves [] = ([], [])
        splitInHalves [x] = ([x], [])
        splitInHalves (x:y:xs) =
            let (xs1, ys1) = splitInHalves xs
            in  (x:xs1, y:ys1)

This gets you the same Merge Sort in O(NlogN) time. It feels different because you would probably implement it in place (by modifying the original list) in an imperative language such as C. This version is slightly more costly on the memory, but it does have it's advantages - it is more easy to reason about, so it is more maintainable, and also it is very easy to parallelize without being concerned of anything else except the algorithm itself - which is exactly what a good programming language should provide for the developers that use it.

EDIT 1 :

If the syntax is a bit much, here are some resources:

  • Pattern Matching - the bit with the @ symbol is called an as-pattern. You'll find it in there
  • let is a keyword used to declare a variable to be used in the expression that follows it (whereas where binds a variable in the expression that precedes it). More on Haskell syntax, including guards (the things with | condition = value) can be found here, in this chapter of Learn You a Haskell

EDIT 2 :

@is7s proposed a far more concise version of splitInHalves using the foldr function:

splitInHalves = foldr (\x (l,r) -> (x:r,l)) ([],[])

EDIT 3 :

Here is another answer which provides an alternative implementation of merge sort, which also has the property of being stable:

Lazy Evaluation and Time Complexity

Hope this helps and welcome to the wonderful world of Functional Programming !

like image 76
Marius Danila Avatar answered Nov 04 '22 12:11

Marius Danila