Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complexity of partial_sort vs nth_element

According to cppreference.com, the complexity of the C++ STL sorting algorithms is:

sort: O(N log(N))

partial_sort: "approximately" O(N log(M)), where M is distance(middle-first)

nth_element: "on average" O(N)

However, this seems to imply that, instead of doing a partial_sort, you could use nth_element and then sort the first range, to give an overall complexity of O(N + M log(M)), which is a bit better than O(N log(M)). Is this actually true? Am I better off avoiding partial_sort?

like image 626
rlbond Avatar asked Jul 02 '15 22:07

rlbond


People also ask

What is partial_ sort in c++?

C++ Algorithm partial_sort() function is used to rearrange the elements in the range[first, last), in such a way that the elements between the first and middle will be sorted and the elements between the middle and last will be in an unspecified order.

Is std :: sort stable?

As of September 2020, it appears that libc++ std::sort happens to be stable for all ranges of size less than 31, and libstdc++ std::sort happens to be stable for all ranges of size less than 17. (Do not rely on this little factoid in production!) To be clear: There's nothing wrong with this.

How is Nth_element implemented?

The nth_element function is typically implemented using Introselect, which brings the average complexity down to O(n).

What is incremental sorting?

Incremental sorting is a version of the partial sorting problem where the input is given up front but k is unknown: given a k-sorted array, it should be possible to extend the partially sorted part so that the array becomes (k+1)-sorted.


1 Answers

std::partial_sort would perform partial sort for the M elements you are interested in. On the other hand std::nth_element would only give you an array, such that nth element is placed such that all elements on the left are smaller and on the right are greater.

Use std::partial_sort for use cases such as, getting top 10 results out of a million in order of rank. Use std::nth_element for finding the median of an array, or to find out who stood 10th in exam results.

If you are just interested in the performance characteristics of both, for smaller values of M, std::partial_sort would perform better than std::nth_element (about 10,000) . For a detailed analysis of this, see: https://www.youtube.com/watch?v=-0tO3Eni2uo

Summary of video

std::nth_element uses modified Quickselect, which provides O(N) complexity regardless of M.

std::partial_sort uses Heapselect, which provides better performance than Quickselect for small M. As a side effect, the end state of Heapselect leaves you with a heap, which means that you get the first half of the Heapsort algorithm "for free".

std::partial_sort is optimized for the case where M is a small constant relative to N. For example, taking the top 10 items from a very large variable-length list. It is not optimized for the other cases.

In a race between std::partial_sort and std::nth_element + std::sort, std::partial_sort jumps out to an early lead (small M) but is overtaken by std::nth_element + std::sort once M is no longer small.

like image 77
Average Joe Avatar answered Oct 01 '22 17:10

Average Joe