Scala includes several methods in the standard library for sorting a list, for example to sort the list list, one can use:
list.sorted
list.sortWith(_<_)
list.sortBy(x=>x)
While these might be the simplest ways to sort a list, I found that for larger lists, they have significant performance drawback.
For example, to sort one million integers, sorted takes on average 500ms, while sortWith and sortBy take around 700ms. This is compared to scala.util.Sorting.quickSort which takes around 120ms and java.util.Arrays.sort which takes around 100ms. For larger lists, this multiple factor difference is observed as we scale further. The pattern is shown in the following chart.
What is the reason for this lag in performance? And why aren't more efficient algorithms/implementations used for the standard methods?
Notice how the lines have the same slope, but are offset from each other? With the logarithmic scale, we're looking at a constant factor difference. sorted
and friends pay the cost of converting a List
to an Array
, sorting (with java.util.Arrays.sort
, in fact), and converting back to List
. scala.util.Sorting.quickSort
and java.util.Arrays.sort
operate on arrays directly. The log n
factor in quicksort's n log n
performance is largely irrelevant, so with the linear time needed to create the Array and the resulting list we end up with a constant factor difference. Five times worse performance might looking horrible, but remember the List
has a cons cell for each element, which makes for massive amounts of random access when creating the Array
, and then creating the new List
requires time spent allocating memory, and in all likelihood, a garbage collection cycle or two.
For lists of primitives, it's even worse. List
is generic, so any primitives must be boxed, which adds another layer of indirection. And unfortunately the Array
that is created holds boxed values as well. Effectively, you end up sorting an Array[java.lang.Integer]
when you really want to be sorting an Array[Int]
.
To summarize: the sorting algorithms are identical, but there are good reasons why mutable arrays outperform immutable singly-linked lists.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With