Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

data structure interview : find max number in array

There is this really good interview question that I encountered somewhere recently and I wanted to ask you all geniuses what can be the most optimized solution for this. So the question is as follows : Given an array of integers, find a maximum number n such that there are atleast n array elements which are greater than n. The input array is unsorted.

e.g. :

Input : 1,2,5,7,8,10 Output : n = 4

Input : 0,2,7,8,19,5,45,9,23 Output : n = 6

One solution I could think of(if the array is sorted case) is a sequential scan of all elements in the array to find out min:n and max:n. Then increment integers between min:n to max:n and check out one by one. But this is O(N) solution. Can somebody suggest a better one ?
e.g. : for input 1 min:n = 2 and max:n = 5
then you would check for numbers 2,3 and 4 as the answer.

As from the answers, if the array is unsorted there is no better than O(N) solution. But the next question is what if the given array is sorted ?

pseudocode :
// this assumes sorted input.
pubic int findhighestIndex(List<Integer> input){
it min=0,max=0,n=0,maxIndex=0;
for(int i=0;i<input.size();i++){
    if( input.get(i)>(input.size()-i) ){
        max=input.get(i);
        maxIndex=i;
        min=input.get(i-1);
        break;
    }
    else if(input.get(i)<(input.size()-i)){
        max=min=input.get(i);
    }
}
int i=max;
while( i>=min && (input.size()-maxIndex)<i ){
i--;
}
System.out.println(i);
}


Update : This problem is also known as finding h-index

like image 669
voidMainReturn Avatar asked Jul 02 '13 01:07

voidMainReturn


2 Answers

Edit: just figured out the O(n) solution for the unsorted case :) see below!

Sorted:

This can be solved in O(log N) for a sorted array, by means of a binary search on n. I'll be using OP's notation here, where N = # of elements and n is the answer we are looking for.

If the array is sorted, it basically means we need to find a position [N - n] so that such position in the array contains a value greater than n - if it does, then there are at least n values greater than it, regardless of repeated values.

Note an answer is always possible, as in the worst case the answer would be 0, and there are always at least 0 elements greater than it. The answer always gets "easier" for lower values, obviously, as it is easier to find 1 element greater than 1, than 10 elements greater than 10. But more importantly, this function follows a monotonic (non-decreasing) behavior which allows us to use a binary search on it.

The idea is as follows:

int N = 9;
int arr[10] = {0,2,5,7,8,9,19,23,45};

int lo = 0, hi = N+1, mid;
while(hi-lo > 1){
    mid = (hi+lo)/2;
    if(arr[N-mid] > mid) lo = mid;
    else hi = mid;
}
n = lo; //highest value that worked

Breakdown: The array has size 9. A binary search may begin trying value n = 5, so we just check whether the 5th element from the end of the array is greater than 5. In this case, 8 > 5 so we can try a better answer. The search would then attempt 7, but the element at position [N-7] is 5, which is lower than 7 and does not satisfy our constraints. Thus the search's last attempt is the value 6, which returns true as 7 > 6.

Unsorted:

For the unsorted case, the idea is incredibly very similar! We can solve it in O(n) by using a Selection Algorithm to identify the [N-n]th element, and at each step divide the search space in the same manner as the binary search.

We begin by searching from [0] to [N-1] to find the median (N/2 th) element, and we can rearrange the array in another O(N) step such that the median element is placed in its correct position, and every element before it has a value <= median, while every element after it has a value >=median.

Now, if that value is greater than n (in this case N/2), we showed above there are at least n elements greater than n, and thus we only need to search further in the lower half of the array. (If the median value is lower than n, we instead consider only the greater half of the array)

Now, assuming median >= N/2 we will repeat the same process from index [0] to [N/2], using a selection "sort" in O(N/2), and so on, each time dividing the search space by 2.

C++ code is as follows:

int N = 9;
int arr[9] = {0,2,7,8,19,5,45,9,23};

int lo = 0, hi = N, mid;
while(hi-lo > 1){
  mid = (hi+lo)/2;
  std::nth_element(arr+lo, arr+mid, arr+hi);
  if(arr[mid] > N-mid) hi = mid;
  else lo = mid;
}
n = N-hi;

In the end, we achieve a complexity of O(N) + O(N/2) + O(N/4) + ... = O(2*N) = O(N)

like image 59
i Code 4 Food Avatar answered Oct 23 '22 22:10

i Code 4 Food


No magic involved

If you've been reading the above and thought "How am I ever going to come up with that in an interview" or "Can I really trust this code has no bugs", then look no further! Let me introduce you to the happy world of 'formal program design'!

In this answer I'll explain how we can turn the problem statement into a pair of inequalities which in turn will force our binary search, so there's just one way to write it. I will also catch a couple of bugs and corner cases left out in the previous answers.

Setting it all up

Let's assume we have a sorted, non-empty array of size N=7.

N: 7
    i: 0 1 2 3 4 5 6
ar[i]: 3 3 4 5 6 6 7

What we really want is an i s.t.

ar[i] <= N-i-1

However, we want the largest one, that is the one furthest to the right, so it must be that

ar[i+1] > N-i-1

Getting formal

What we are going to do is to keep two variables lo and hi st. we always have

ar[lo] <= N-lo-1   (1)
ar[hi] > N-hi-1    (2)

(Notice the substitution of i+1 for hi in the second equation).

We will then carefully move the variables towards each other, until lo+1 = hi, at which point we have found the i we originally sought.

Now we need some starting values.

  • A choice for hi could be N. This is out of the range of the array, but we are never going to read it, so we'll just assume it is a huge value that satisfies equation (2).

  • It's harder for lo, because can we even be sure that such a value exists? NO! The array [7,8,9] has no index which satisfies the wanted property, and thus we have found our first corner case. We can assume that if any index satisfies (1) it must be 0, but we have to introduce a test to see if it is actually ok to proceed.

Sweet! We've avoided a nasty bug.

Plugging it into the code

Ok, at this point it's time to invoke binary search. Really the work is already done, and we simply write:

if ar[0] > N-0-1:
    panic("No solutions found!")

lo, hi = 0, N
while lo+1 != hi:
    mid = (lo + hi)/2
    if ar[mid] <= N-mid-1:
        lo = mid
    if ar[mid] > N-mid-1:
        hi = mid

print "The solution is ar[%d] = %d" % (lo, ar[lo])

(Notice that we can change the second if to an else, since the conditions are the inverses of each other)

The result

Running it on the original example gives us:

The solution is ar[2] = 4

For fun, I also tried to run "i Code 4 Food"'s code with the same array. I think he assumes the values are unique, since he returns

lo = 4

Which clearly doesn't work, since ar[4] = 6, and there are only two values after that.

like image 21
Thomas Ahle Avatar answered Oct 23 '22 22:10

Thomas Ahle