Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Collection sorted, sortWith and sortBy Performance

Scala includes several methods in the standard library for sorting a list, for example to sort the list list, one can use:

list.sorted
list.sortWith(_<_)
list.sortBy(x=>x)

While these might be the simplest ways to sort a list, I found that for larger lists, they have significant performance drawback.

For example, to sort one million integers, sorted takes on average 500ms, while sortWith and sortBy take around 700ms. This is compared to scala.util.Sorting.quickSort which takes around 120ms and java.util.Arrays.sort which takes around 100ms. For larger lists, this multiple factor difference is observed as we scale further. The pattern is shown in the following chart.

Performance of various Scala sorting methods

What is the reason for this lag in performance? And why aren't more efficient algorithms/implementations used for the standard methods?

like image 989
deepkimo Avatar asked May 11 '14 03:05

deepkimo


1 Answers

Notice how the lines have the same slope, but are offset from each other? With the logarithmic scale, we're looking at a constant factor difference. sorted and friends pay the cost of converting a List to an Array, sorting (with java.util.Arrays.sort, in fact), and converting back to List. scala.util.Sorting.quickSort and java.util.Arrays.sort operate on arrays directly. The log n factor in quicksort's n log n performance is largely irrelevant, so with the linear time needed to create the Array and the resulting list we end up with a constant factor difference. Five times worse performance might looking horrible, but remember the List has a cons cell for each element, which makes for massive amounts of random access when creating the Array, and then creating the new List requires time spent allocating memory, and in all likelihood, a garbage collection cycle or two.

For lists of primitives, it's even worse. List is generic, so any primitives must be boxed, which adds another layer of indirection. And unfortunately the Array that is created holds boxed values as well. Effectively, you end up sorting an Array[java.lang.Integer] when you really want to be sorting an Array[Int].

To summarize: the sorting algorithms are identical, but there are good reasons why mutable arrays outperform immutable singly-linked lists.

like image 170
wingedsubmariner Avatar answered Nov 02 '22 18:11

wingedsubmariner