Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficiently find amount of integers in a sorted array

I'm studying for a test, and found this question.

You are given a sorted array of integers for example:

{-5, -5, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 67, 67, 99}

Write a method:

Public static int count (int[] a, int x)

that returns the amount of times, number 'x' is in the array.

for example:

x = -5, it returns 2
x = 2, it returns 5
x = 8, it returns 0

I need to write it as efficient as possible, please don't give me the answer (or write it if you wish, but I won't look), my idea is to do a binary search, and then go both edges (backward and forward) of the value that I find and with the index numbers, return the correct answer, my questions are:

  1. is it the most efficient way?
  2. won't it be O(n) in the worst case? (when the array is filled with a single number) -

If so - then why should I do a binary search?

like image 773
Seth Keno Avatar asked Apr 02 '14 14:04

Seth Keno


4 Answers

Modify your binary search to find the first and last occurrence of the given input, then the difference between those two indexes is the result.

To find a first and last occurrence using binary search you need to change a bit from the usual binary search algorithm. In binary search the value is returned when a match is found. But here unlike the usual binary search you need to continue searching until you find a mismatch.

helpful links

finding last occurence, finding first occurance

A bit update

after you find the first occurrence then you can use that index as a starting point of the next binary search to find the last.

like image 125
stinepike Avatar answered Oct 19 '22 00:10

stinepike


Two solutions come to mind:

1) Do Binary Search alright, but maintain that invariant that it finds the first occurence. Then do a linear search. This will be Theta(log n + C) where C is the count.

Programming Pearls by Jon Bentley has a nice write up, where he mentions looking for the first occurence is actually more efficient than looking for any occurence.

2) You can also do two binary searches, one for the first occurence, and one for the last, and take the difference of the indices. This will be Theta(log n).


You can decide which solution to apply based on the expected value of C. If C = o(log n) (yes small o), then looking for the first occurence and doing linear search will be better. Otherwise do two binary searches.

If you don't know the expected value of C, then you might be better off with solution 2.

like image 26
RelevantTitlesAreBestTitles Avatar answered Oct 18 '22 23:10

RelevantTitlesAreBestTitles


Do binary Search to find the first occurence. Do binary search to find the last occurence. The number of occurences is equal to the number of numbers between the 2 indices found.

Working code:

public class Main {
    public static void main(String[] args){
        int[] arr = {-5, -5, 1, 1, 1, 1, 1, 1, 
                                    1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 67, 67, 99};
        int lo = getFirst(arr, -5);
        if(lo==arr.length){ // the number is not present in the array.
            System.out.println(0);
        }else{
            int hi = getLast(arr, -5);
            System.out.println((hi-lo+1));
        }
    }

    // Returns last occurence of num or arr.length if it does not exists in arr.
    static int getLast(int[] arr, int num){
        int lo = 0, hi = arr.length-1, ans = arr.length;
        while(lo<=hi){
            int mid = (lo+hi)/2;
            if(arr[mid]==num){
                ans = mid;
                lo = mid+1;
            }else if(arr[mid]<num){
                lo = mid+1;
            }else if(arr[mid]>num){
                hi = mid-1;
            }
        }
        return ans;
    }

    // Returns first occurence of num or arr.length if it does not exists in arr.
    static int getFirst(int[] arr, int num){
        int lo = 0, hi = arr.length-1, ans = arr.length;
        while(lo<=hi){
            int mid = (lo+hi)/2;
            if(arr[mid]==num){
                ans = mid;
                hi = mid-1;
            }else if(arr[mid]<num){
                lo = mid+1;
            }else if(arr[mid]>num){
                hi = mid-1;
            }
        }
        return ans;
    }
}
like image 5
Nikunj Banka Avatar answered Oct 18 '22 22:10

Nikunj Banka


Actually there is a slightly better solution than the given solutions! It is a combination of two different ways to do binary search.

First you do a binary search to get the first occurence. This is O(log n)

Now, starting with the first index you just found, you do a different kind of binary search: You guess the frequency of that element F, by starting with a guess of F=1 and doubling your estimate and check if the element repeats. This is guaranteed to be O(log F) (where F is the frequency).

This way, the algorithm runs in O(log N + log F)

You do not have to worry about the distribution of the numbers!

like image 1
Ukkonen Avatar answered Oct 19 '22 00:10

Ukkonen