Fast functional merge sort

Tags:

Here's my implementation of merge sort in Scala:

object FuncSort {
  def merge(l: Stream[Int], r: Stream[Int]) : Stream[Int] = {
    (l, r) match {
      case (h #:: t, Empty) => l
      case (Empty, h #:: t) => r
      case (x #:: xs, y #:: ys) => if(x < y ) x #:: merge(xs, r) else y #:: merge(l, ys)
    }
  }

  def sort(xs: Stream[Int]) : Stream[Int] = {
    if(xs.length == 1) xs
    else {
      val m = xs.length / 2
      val (l, r) = xs.splitAt(m)
      merge(sort(l), sort(r))
    }
  }
}

It works correctly and it seems that asymptotically it is fine as well but it is way slower (approx 10 times) than Java implementation from here http://algs4.cs.princeton.edu/22mergesort/Merge.java.html and uses a lot of memory. Is there a faster implementation of merge sort which is functional? Obviously, it's possible to port Java version line by line but that's not what I'm looking for.

UPD: I've changed Stream to List and #:: to :: and the sorting routine became faster, only three to four times slower than Java version. But I don't understand why doesn't it crashes with stack overflow? merge isn't tail-recursive, all arguments are strictly evaluated...how is it possible?

296

asked Sep 22 '13 13:09

synapse

1 Answers

You have raised multiple questions. I try to answer them in a logical order:

No stack overflow in the Stream version

You did not really ask this one, but it leads to some interesting observations.

In the Stream version you are using #:: merge(...) inside the merge function. Usually this would be a recursive call and might lead to a stack overflow for big enough input data. But not in this case. The operator #::(a,b) is implemented in class ConsWrapper[A] (there is an implicit conversion) and is a synonym for cons.apply[A](hd: A, tl: ⇒ Stream[A]): Cons[A]. As you can see, the second argument is call by name, meaning it is evaluated lazily.

That means merge returns a newly created object of type cons which will eventually call merge again. In other words: The recursion does not happen on the stack, but on the heap. And usually you have plenty of heap.

Using the heap for recursion is a nice technique to handle very deep recursions. But it is much slower than using the stack. So you traded speed for recursion depth. This is the main reason, why using Stream is so slow.

The second reason is, that for getting the length of the Stream, Scala has to materialize the whole Stream. But during sorting the Stream it would have to materialize each element anyway, so this does not hurt very much.

No stack overflow in List version

When you are changing Stream for List, you are indeed using the stack for recursion. Now a Stack overflow could happen. But with sorting, you usually have a recursion depth of log(size), usually the logarithm of base 2. So to sort 4 billion input items, you would need a about 32 stack frames. With a default stack size of at least 320k (on Windows, other systems have larger defaults), this leaves place for a lot of recursions and hence for lots of input data to be sorted.

Faster functional implementation

It depends :-)

You should use the stack, and not the heap for recursion. And you should decide your strategy depending on the input data:

For small data blocks, sort them in place with some straight forward algorithm. The algorithmic complexity won't bite you, and you can gain a lot of performance from having all data in cache. Of course, ou could still hand code sorting networks for the given sizes.
If you have numeric input data, you can use radix sort and handle the work to the vector units on you processor or your GPU (more sophisticated algorithms can be found in GPU Gems).
For medium sized data blocks, you can use a divide-and-conquer strategy to split the data to multiple threads (only if you have multiple cores!)
For huge data blocks use merge sort and split it of in blocks that fit in memory. If you want, you can distribute these blocks on the network and sort in memory.

Don't use swap and use your caches. Use mutable data structures if you can and sort in place. I think that functional and fast sorting does not work very well together. To make sorting really fast, you will have to use stateful operations (e.g. in-place mergesort on mutable arrays).

I usually try this on all my programs: Use pure functional style as far as possible but use stateful operations for small parts when feasible (e.g. because it has better performance or the code just has to deal with lots of states and becomes much better readable when I use vars instead of vals).

171

answered Oct 22 '22 12:10

stefan.schwetschke

Related questions
                            
                                how to numerically sample from a joint, discrete, probability distribution function
                            
                                Inference engines vs Decision trees [closed]
                            
                                Finding an Insertion in a String
                            
                                Getting n smallest numbers in a sequence
                            
                                Is there a name for this sampling algorithm used in Minicraft?
                            
                                Adding Accents to Speech Generation
                            
                                Finding the Nth Twin Prime
                            
                                How to determine which aspect ratios are closest
                            
                                Given a RNG algorithm and a series of numbers is it possible to determine what seed would produce the series?
                            
                                What's the best way to merge a set of rectangles in an image?
                            
                                Select distinct groups of rows according to average
                            
                                Using Strongly Connected Component Algo to check if a vertex is reachable
                            
                                Complexity of the QuickHull Algorithm?
                            
                                Divide up a rectangle based on pairs of points
                            
                                Is there an efficient way to count the number of intersections among a given set of line segments?
                            
                                Merkle Tree Data Synchronization False Positives
                            
                                Why is KNN much faster than decision tree?
                            
                                Given many horizontal and vertical lines, how to find all the rectangles that do have any sub-rectangle inside them?
                            
                                Detect when a graph has been broken into two or more connected components
                            
                                Improve the solution to monkey grid puzzle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast functional merge sort

Tags:

algorithm

sorting

scala