Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to count occurrences of a key in a sorted array

This was asked in on-site Microsoft interview.

Count the number of occurrences of a given key in an array.

I answered linear search because the elements may be scattered in the array. Say the key is found in the beginning and at the end. So we need to scan the entire array.

Next he asked what if the array is sorted?

Thought for a while and said I'll use linear search again. Because the repetitions of the key if present can be anywhere in the array. As an optimization I also said if first and last array elements are same you can take the array length as the answer.

Is my analysis correct in both the cases?

Example:

Input = [0 0 1 1 1 2 2 3 3], key = 1, Answer = 3
Input = [0 0 2 2 3 3],       key = 1, Answer = 0
like image 288
Edward Avatar asked Dec 01 '10 08:12

Edward


3 Answers

For unsorted array there is not much we can do other than linear search.

For sorted array you can do it in O(logN) using a slightly modified binary search:

  • Find the index of first occurrence of key, call it f.
  • Find the index of last occurrence of key, call it l.
  • If the key exists in the array l-f+1 is the answer.

Finding the first occurrence:

arr[i] is the first occurrence of key iff

  • arr[i] == key and either
    • i == 0 ( it's the first element of the array) or
    • arr[i-1] != key (it's not the first element of the array and element to it's left is different)

You can slightly modify the binary search to find the first occurrence.
In a binary search you terminate the search when you find arr[mid] == key.
Modify the condition such that you terminate the search when you find the first occurrence instead of any occurrence.

Algorithm:

low = 0
high = arrSize - 1 

while low <=  high

  mid = (low + high) / 2

  //if arr[mid] == key         // CHANGE
  if arr[mid] == key AND ( mid == 0 OR arr[mid-1] != key )
    return mid
  //else if ( key < arr[mid] ) // CHANGE
  else if ( key <= arr[mid] ) 
    high = mid - 1
  else
    low = mid + 1        
  end-if

end-while

return -1

Similarly you can find the last occurrence.

like image 84
codaddict Avatar answered Sep 25 '22 23:09

codaddict


For once, I will propose an implementation in C++.

size_t count(std::vector<int> const& vec, int key)
{
  auto p = std::equal_range(vec.begin(), vec.end(), key);
  return std::distance(p.first, p.second);
}

equal_range uses a binary search, the result is equivalent to:

std::make_pair(std::lower_bound(vec.begin(), vec.end(), key),
               std::upper_bound(vec.begin(), vec.end(), key));

but the implementation should makes it slightly faster, even though all are in O(log N) (in terms of number of comparison).

like image 41
Matthieu M. Avatar answered Sep 26 '22 23:09

Matthieu M.


#include<stdio.h>
int binarysearch(int a[],int n,int k,bool searchfirst){
    int result=-1;
    int low=0,high=n-1;
    while(low<=high){
        int mid=(low+high)/2;
        if(a[mid]==k)  {
              result=mid; 
           if(searchfirst)
              high=mid-1; 
            else
              low=mid+1;
    }
    else if(k<a[mid])  high=mid-1;
    else low=mid+1;
    }
    return result;
}

int main(){
    int a[]={1,1,1,2,2,3,3,3,6,6,6,6,6,7,7};
    int n=sizeof(a)/sizeof(a[0]);
    int x=6;
    int firstindex=binarysearch(a,n,x,true);
    printf("%d\n",firstindex);
    if(firstindex==-1){
        printf("elment not found in the array:\n ");
    }
    else {
        int lastindex=binarysearch(a,n,x,false);
        printf("%d\n",lastindex);
        printf("count is = %d", lastindex-firstindex+1);
    }

}
like image 35
yahoo Avatar answered Sep 22 '22 23:09

yahoo