Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the kth largest element in an unsorted array of length n in O(n)?

People also ask

How do you find the kth largest element in an unsorted array in C?

Algorithm​ 1. Sort the array using any sorting technique in descending order. 2. Iterate through the array till you reach the K-th element and then print the value at the K-th index.

How do you find KTH smallest or largest element in an unsorted array?

Approach: Using Max Heap Observe the following algorithm. Step 1: Using the first k elements of the input array (a[0], …, a[k - 1], create a Max-Heap. Step 2: Compare each element that is coming after the k'th element (a[k] to a[n - 1]) with the root element of the max-heap.


This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.

Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):

Select(A,n,i):
    Divide input into ⌈n/5⌉ groups of size 5.

    /* Partition on median-of-medians */
    medians = array of each group’s median.
    pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
    Left Array L and Right Array G = partition(A, pivot)

    /* Find ith element in L, pivot, or G */
    k = |L| + 1
    If i = k, return pivot
    If i < k, return Select(L, k-1, i)
    If i > k, return Select(G, n-k, i-k)

It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.


If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)

The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.

Here is the algorithm.

QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of

T(n) = Theta(n) + T(n-1) = Theta(n2)

But if the choices are indeed random, the expected running time is given by

T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))

where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.

Let's guess that T(n) <= an for some a. Then we get

T(n) 
 <= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
 = cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n ai

and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:

i=floor(n/2) to n i  
 = ∑i=1 to n i - ∑i=1 to floor(n/2) i  
 = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2  
 <= n2/2 - (n/4)2/2  
 = (15/32)n2

where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with

cn + 2 (1/n) ∑i=floor(n/2) to n ai,
 <= cn + (2a/n) (15/32) n2
 = n (c + (15/16)a)
 <= an

provided a > 16c.

This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).


A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17

"Make one pass through tracking the three largest values so far." 

(it was specifically for 3d largest)

and this answer:

Build a heap/priority queue.  O(n)
Pop top element.  O(log n)
Pop top element.  O(log n)
Pop top element.  O(log n)

Total = O(n) + 3 O(log n) = O(n)

You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).


A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.

I answered the letter of your question :)


The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.

const int N = ...;
double a[N];
// ... 
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a

Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java

public int quickSelect(ArrayList<Integer>list, int nthSmallest){
    //Choose random number in range of 0 to array length
    Random random =  new Random();
    //This will give random number which is not greater than length - 1
    int pivotIndex = random.nextInt(list.size() - 1); 

    int pivot = list.get(pivotIndex);

    ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
    ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();

    //Split list into two. 
    //Value smaller than pivot should go to smallerNumberList
    //Value greater than pivot should go to greaterNumberList
    //Do nothing for value which is equal to pivot
    for(int i=0; i<list.size(); i++){
        if(list.get(i)<pivot){
            smallerNumberList.add(list.get(i));
        }
        else if(list.get(i)>pivot){
            greaterNumberList.add(list.get(i));
        }
        else{
            //Do nothing
        }
    }

    //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list 
    if(nthSmallest < smallerNumberList.size()){
        return quickSelect(smallerNumberList, nthSmallest);
    }
    //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
    //The step is bit tricky. If confusing, please see the above loop once again for clarification.
    else if(nthSmallest > (list.size() - greaterNumberList.size())){
        //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in 
        //smallerNumberList
        nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
        return quickSelect(greaterNumberList,nthSmallest);
    }
    else{
        return pivot;
    }
}