Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is Array.sort performance affected by the initial ordering of the input?

Is it possible for the ordering of the input to affect the performance of Array.sort()? If so, how?

like image 348
Alex Ciminian Avatar asked Sep 14 '17 21:09

Alex Ciminian


People also ask

For which sorting techniques performance is dependent of the initial order of the array items?

Explanation: In Insertion sort if the array is already sorted then it takes O(n) and if it is reverse sorted then it takes O(n2) to sort the array. In Quick sort, if the array is already sorted or if it is reverse sorted then it takes O(n2). The best and worst case performance of Selection is O(n2) only.

What is the most efficient way of sorting an array?

Quicksort. Quicksort is one of the most efficient sorting algorithms, and this makes of it one of the most used as well. The first thing to do is to select a pivot number, this number will separate the data, on its left are the numbers smaller than it and the greater numbers on the right.

How is sorting performed through array?

The array can be sorted in ascending order by repeatedly finding the minimum element (considering ascending order) from unsorted part and putting it at the beginning. The algorithm maintains two subarrays in a given array. The subarray which is already sorted. Remaining subarray which is unsorted.

Does array sort changes the original array?

The sort() method returns a reference to the original array, so mutating the returned array will mutate the original array as well.


1 Answers

This depends on a few things:

  • the runtime (different browsers/runtimes use different sorting algorithms)
  • how your input is arranged relative to the desired order
  • if you are using a custom comparator or not (also related to the previous point)

An application I am working on experienced severe performance degradation in a module which was sorting a list of 35K+ strings after the API endpoint it was hitting started offering it data in sorted order. The time spent in the front-end sort went from around 30 ms to 6 seconds (200x).

The sorting is done using a custom comparator which prioritises strings ending with a certain suffix. If none or both of the strings end in the suffix, then natural ordering is used. I profiled the module using the browser developer tools and found out that most of the time was spent doing this comparison. The profile also showed that QuickSort is the underlying algorithm used by Array.sort() (at least in Chrome).

quicksort profile

This was strange, as QuickSort should not be affected by the ordering of the input. According to Wikipedia:

[the worst case] may occur if the pivot happens to be the smallest or largest element in the list, or in some implementations (e.g., the Lomuto partition scheme as described above) when all the elements are equal.

I became curious and benchmarked multiple variants of doing the sort. I used benchmark.js running on node in the command line. Both the benchmark and the browser are running on top of v8 so they should use the same sorting algorithm. The results were surprising:

6 tests completed.

  Ordered array, sorted with a default comparator           x 34.27 ops/sec ±1.07% (59 runs sampled)
  Ordered array, sorted with a custom comparator            x  0.18 ops/sec ±2.81% (5 runs sampled)
  Ordered array, shuffled, sorted with a custom comparator  x 38.37 ops/sec ±3.67% (51 runs sampled)
  Ordered array, shuffled, sorted with a default comparator x 29.20 ops/sec ±1.28% (51 runs sampled)
  Unordered array, sorted with a default comparator         x 28.38 ops/sec ±1.28% (50 runs sampled)
  Unordered array, sorted with a custom comparator          x 42.10 ops/sec ±1.32% (55 runs sampled)

These results indicate that the performance degradation is because of the distribution of the data in relation to the comparator. Here are some characteristics of the input:

  • the suffix the comparator prioritises (/Prod) matches about 20% of the strings
  • when the strings are sorted alphabetically, the ones ending in /Prod are very likely to be spread relatively uniformly throughout
  • sequences like ABC/Alpha, ABC/Beta, ABC/Prod are common.

This probably makes the algorithm more likely to select a pivot which is at an extreme in its subsequence and thus causes a very large number of comparisons between elements to be performed.

This only happens in Chrome 61. I've tested both Firefox 52.3 and Safari 10.1 and the issue is not reproduced. I presume it's because a different sorting algorithm is used.

like image 148
Alex Ciminian Avatar answered Sep 27 '22 19:09

Alex Ciminian