Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithms - Count all pairs of equal numbers in a sorted array in O(n)?

A question that has me speculating is the following:

Let's say we have a sorted array with the numbers {1,1,1,1,2,2,4,4,4}.

Now, given that we can clearly see that we have six pairs on 1's, one pair of 2's and three pairs of 4's (10 pairs). How would you construct an algorithm that finds these pairs in O(n)?

I have an algorithm that counts the pairs in an array and does so like this:

Arrays.sort(array);
int counter = 0; 
for(int i = 0; i<array.length; i++) {
     for(int j = i+1; j<array.length; j++) {
          if(j!=i && array[j] == array[i]) {
	counter++;
	}
      }
}
return counter; 

But this runs in O(N^2) and I guess (being the novice that I am with algorithms) that there's a better solution to this to get the same results using only one for-loop or multiple while-loops?

I'd like to hear your thoughts!

like image 598
Jens O. Anders Olsén Avatar asked Sep 26 '17 12:09

Jens O. Anders Olsén


Video Answer


3 Answers

You can do it in O(N):

int pairs = 0; 
int times = 1;
for (int i = 1; i < array.length; ++i) {
    if (array[i] == array[i-1]) {
        ++times;
    } else {
        pairs += times*(times-1) / 2;
        times = 1;
    }                   
}
pairs += times*(times-1) / 2;
return pairs;       

Runnable: https://ideone.com/mu65ie

For every distinct number, count the number of its occurrences times. The number of different pairs is equal to number of choices C(times, 2) = times*(times-1) / 2.

like image 170
DAle Avatar answered Sep 23 '22 00:09

DAle


Okay so here's also my solution:

 int i = 0;
 int pairs = 0;

 while (i < array.length - 1) {
    if(array[i] == array[i + 1]) {
        pairs += 1;
        i += 2;
    } else {
        i++;
    }
 }

When a pair is found the index is incremented by two, this makes traversal a little bit faster. But the complexity is O(n) anyway.

Of course you run this after the array is sorted.

like image 35
Schidu Luca Avatar answered Sep 19 '22 00:09

Schidu Luca


The secret is to stop reiterating. Cache data as it appears. You can use caching to reduce this into an O(nlogn) problem.

Pairs is very vague wording, so in the future a few more examples will clarify things you don't know a name to find answers for. You can use mathematics of Combinations to reduce the problem into O(n).

The wikipedia article is a bit obtuse to read, but the equation you're looking for is near the top:

n! / (k! * (n - k)!)

where ! indicates a factorial number, n indicates the amount of items to be combined (4 1s), and k indicates the amount of items per combination (2 for a pair). So substituting these values:

4! = 24
2! = 2
(4-2)! = 2
4!/(2!2!) = 24/4 = 6

Using this equation it can be reduced to O(n). Since factorials are used and the dataset is sorted, you can further improve performance by caching the result of the factorial call for future calls. Sorted input for a factorial function will have cache hits for nearly every lookup!

Caching may not be necessary if you are using python 3 as it uses a much more efficient algorithm to calculate factorials than python 2. Caching will reduce redundancy, however which may yield a good result on very large values.

An example of caching (memoization):

import math

class Cache(object):
    memos = dict()

    def factorial(self, n):
         if not n in self.memos:
             self.memos[n] = math.factorial(n)
         return self.memos[n]
like image 30
Aaron3468 Avatar answered Sep 22 '22 00:09

Aaron3468