According to cppreference.com, the complexity of the C++ STL sorting algorithms is:
sort
: O(N log(N))
partial_sort
: "approximately" O(N log(M)), where M is distance(middle-first)
nth_element
: "on average" O(N)
However, this seems to imply that, instead of doing a partial_sort
, you could use nth_element
and then sort the first range, to give an overall complexity of O(N + M log(M)), which is a bit better than O(N log(M)). Is this actually true? Am I better off avoiding partial_sort
?
C++ Algorithm partial_sort() function is used to rearrange the elements in the range[first, last), in such a way that the elements between the first and middle will be sorted and the elements between the middle and last will be in an unspecified order.
As of September 2020, it appears that libc++ std::sort happens to be stable for all ranges of size less than 31, and libstdc++ std::sort happens to be stable for all ranges of size less than 17. (Do not rely on this little factoid in production!) To be clear: There's nothing wrong with this.
The nth_element function is typically implemented using Introselect, which brings the average complexity down to O(n).
Incremental sorting is a version of the partial sorting problem where the input is given up front but k is unknown: given a k-sorted array, it should be possible to extend the partially sorted part so that the array becomes (k+1)-sorted.
std::partial_sort
would perform partial sort for the M elements you are interested in. On the other hand std::nth_element
would only give you an array, such that nth element is placed such that all elements on the left are smaller and on the right are greater.
Use std::partial_sort
for use cases such as, getting top 10 results out of a million in order of rank. Use std::nth_element
for finding the median of an array, or to find out who stood 10th in exam results.
If you are just interested in the performance characteristics of both, for smaller values of M, std::partial_sort
would perform better than std::nth_element
(about 10,000) . For a detailed analysis of this, see: https://www.youtube.com/watch?v=-0tO3Eni2uo
std::nth_element
uses modified Quickselect, which provides O(N) complexity regardless of M.
std::partial_sort
uses Heapselect, which provides better performance than Quickselect for small M. As a side effect, the end state of Heapselect leaves you with a heap, which means that you get the first half of the Heapsort algorithm "for free".
std::partial_sort
is optimized for the case where M is a small constant relative to N. For example, taking the top 10 items from a very large variable-length list. It is not optimized for the other cases.
In a race between std::partial_sort
and std::nth_element
+ std::sort
, std::partial_sort
jumps out to an early lead (small M) but is overtaken by std::nth_element
+ std::sort
once M is no longer small.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With