I need to find the indices of the k largest elements of an unsorted, length n, array/vector in C++, with k < n. I have seen how to use nth_element() to find the k-th statistic, but I'm not sure if using this is the right choice for my problem as it seems like I would need to make k calls to nth_statistic, which I guess it would have complexity O(kn), which may be as good as it can get? Or is there a way to do this just in O(n)?
Implementing it without nth_element() seems like I will have to iterate over the whole array once, populating a list of indices of the largest elements at each step.
Is there anything in the standard C++ library that makes this a one-liner or any clever way to implement this myself in just a couple lines? In my particular case, k = 3, and n = 6, so efficiency isn't a huge concern, but it would be nice to find a clean and efficient way to do this for arbitrary k and n.
It looks like Mark the top N elements of an unsorted array is probably the closest posting I can find on SO, the postings there are in Python and PHP.
It can be clearly observed that Kth largest element is the same as (N – K)th smallest element, where N is the size of the given array. Therefore, we can apply the Kth smallest approach to this problem. Firstly, a pivot element must be chosen, similar to what we do in quicksort.
Approach: Using Max Heap Observe the following algorithm. Step 1: Using the first k elements of the input array (a[0], …, a[k - 1], create a Max-Heap. Step 2: Compare each element that is coming after the k'th element (a[k] to a[n - 1]) with the root element of the max-heap.
Using Max Heap log(n)) by using a max-heap. The idea is to simply construct a max-heap of size n and insert all the array elements [0…n-1] into it. Then pop first k-1 elements from it. Now k'th largest element will reside at the root of the max-heap.
Here is my implementation that does what I want and I think is reasonably efficient:
#include <queue>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
std::vector<double> test = {0.2, 1.0, 0.01, 3.0, 0.002, -1.0, -20};
std::priority_queue<std::pair<double, int>> q;
for (int i = 0; i < test.size(); ++i) {
q.push(std::pair<double, int>(test[i], i));
}
int k = 3; // number of indices we need
for (int i = 0; i < k; ++i) {
int ki = q.top().second;
std::cout << "index[" << i << "] = " << ki << std::endl;
q.pop();
}
}
which gives output:
index[0] = 3
index[1] = 1
index[2] = 0
This should be an improved version of @hazelnusse which is executed in O(nlogk)
instead of O(nlogn)
#include <queue>
#include <iostream>
#include <vector>
// maxindices.cc
// compile with:
// g++ -std=c++11 maxindices.cc -o maxindices
int main()
{
std::vector<double> test = {2, 8, 7, 5, 9, 3, 6, 1, 10, 4};
std::priority_queue< std::pair<double, int>, std::vector< std::pair<double, int> >, std::greater <std::pair<double, int> > > q;
int k = 5; // number of indices we need
for (int i = 0; i < test.size(); ++i) {
if(q.size()<k)
q.push(std::pair<double, int>(test[i], i));
else if(q.top().first < test[i]){
q.pop();
q.push(std::pair<double, int>(test[i], i));
}
}
k = q.size();
std::vector<int> res(k);
for (int i = 0; i < k; ++i) {
res[k - i - 1] = q.top().second;
q.pop();
}
for (int i = 0; i < k; ++i) {
std::cout<< res[i] <<std::endl;
}
}
8 4 1 2 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With