Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all values that occur more than n/k times in an array?

I'm pursuing the Algorithms, Part I course on Coursera, and one of the interview questions (ungraded) is as follows:

Decimal dominants. Given an array with n keys, design an algorithm to find all values that occur more than n/10 times. The expected running time of your algorithm should be linear.

It has a hint:

determine the (n/10)th largest key using quickselect and check if it occurs more than n/10 times.

I don't understand what does the n/10 largest key have to do with n/10 repeated values. It won't tell me which values occur more than n/10 times.

There's a paper that finds a more general solution for n/k, but I'm having a hard time understanding the code in the paper.

One way to solve it is to sort the input array, and then make another pass counting the occurrence of each distinct value. That'll take O(nlogn) + O(n) time, which is more than what the question asks for.

Ideas?

like image 525
Abhijit Sarkar Avatar asked May 28 '18 03:05

Abhijit Sarkar


2 Answers

Finding the n/10th largest key (that is, the key that would be at position n/10 if the array was sorted) takes linear time using QuickSelect. If there are less than n/10 copies of this key, then you know that there are not n/10 copies of anything above it in sorted order, because there isn't room for n/10 copies of anything above the key in sorted order. If there are n/10 or more copies, then you have found something that occurs more than n/10 times, and again there can't be anything larger than it that occurs more than n/10 times, because there isn't room for it.

Now you have an array of at most 9n/10 values smaller than the key you have just found left over from QuickSelect. Use another pass of QuickSelect to find the key n/10 from the top of this left over array. As before, you may find a key that occurs n/10 or more times, and whether you do or not you will eliminate at least n/10 entries from the array.

So you can search the whole array with 10 calls of QuickSelect, each taking linear time. 10 is a number fixed in the problem definition, so the whole operation counts as only linear time.

like image 88
mcdowella Avatar answered Nov 15 '22 07:11

mcdowella


There is a variation of Boyer-Moore Voting algorithm which can find all the elements that occurs more than n/k in a input which runs in O(nk) and since k = 10 for your problem I think it should run in O(n * 10) = O(n) time.

From here

Following is an interesting O(nk) solution: We can solve the above problem in O(nk) time using O(k-1) extra space. Note that there can never be more than k-1 elements in output (Why?). There are mainly three steps in this algorithm.

1) Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements). Following is structure of temporary array elements.

 struct eleCount {
     int element;
     int count; };  
 struct eleCount temp[];  This step takes O(k) time.

2) Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.

3) Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.

The main step is step 2, how to maintain (k-1) potential candidates at every point? The steps used in step 2 are like famous game: Tetris. We treat each number as a piece in Tetris, which falls down in our temporary array temp[]. Our task is to try to keep the same number stacked on the same column (count in temporary array is incremented).

Consider k = 4, n = 9 Given array: 3 1 2 2 2 1 4 3 3

i = 0

     3 _ _ temp[] has one element, 3 with count 1

i = 1

     3 1 _ temp[] has two elements, 3 and 1 with  counts 1 and 1 respectively

i = 2

     3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 1 respectively.

i = 3

     - - 2 
     3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively.

i = 4

     - - 2 
     - - 2 
     3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 3 respectively.

i = 5

     - - 2 
     - 1 2 
     3 1 2 temp[] has three elements, 3, 1 and 2 with counts as 1, 2 and 3 respectively.  

Now the question arises, what to do when temp[] is full and we see a new element – we remove the bottom row from stacks of elements, i.e., we decrease count of every element by 1 in temp[]. We ignore the current element.

i = 6

     - - 2 
     - 1 2  temp[] has two elements, 1 and 2 with counts as 1 and 2 respectively.

i = 7

       - 2 
     3 1 2  temp[] has three elements, 3, 1 and 2 with counts as 1, 1 and 2 respectively.

i = 8

     3 - 2
     3 1 2  temp[] has three elements, 3, 1 and 2 with counts as 2, 1 and 2 respectively. 

Finally, we have at most k-1 numbers in temp[]. The elements in temp are {3, 1, 2}. Note that the counts in temp[] are useless now, the counts were needed only in step 2. Now we need to check whether the actual counts of elements in temp[] are more than n/k (9/4) or not. The elements 3 and 2 have counts more than 9/4. So we print 3 and 2.

For a proper proof of this approach check out this answer from cs.stackexchange

like image 23
thebenman Avatar answered Nov 15 '22 07:11

thebenman