A question that has me speculating is the following:
Let's say we have a sorted array with the numbers {1,1,1,1,2,2,4,4,4}.
Now, given that we can clearly see that we have six pairs on 1's, one pair of 2's and three pairs of 4's (10 pairs). How would you construct an algorithm that finds these pairs in O(n)?
I have an algorithm that counts the pairs in an array and does so like this:
Arrays.sort(array);
int counter = 0;
for(int i = 0; i<array.length; i++) {
for(int j = i+1; j<array.length; j++) {
if(j!=i && array[j] == array[i]) {
counter++;
}
}
}
return counter;
But this runs in O(N^2) and I guess (being the novice that I am with algorithms) that there's a better solution to this to get the same results using only one for-loop or multiple while-loops?
I'd like to hear your thoughts!
You can do it in O(N)
:
int pairs = 0;
int times = 1;
for (int i = 1; i < array.length; ++i) {
if (array[i] == array[i-1]) {
++times;
} else {
pairs += times*(times-1) / 2;
times = 1;
}
}
pairs += times*(times-1) / 2;
return pairs;
Runnable: https://ideone.com/mu65ie
For every distinct number, count the number of its occurrences times
. The number of different pairs is equal to number of choices C(times, 2) = times*(times-1) / 2
.
Okay so here's also my solution:
int i = 0;
int pairs = 0;
while (i < array.length - 1) {
if(array[i] == array[i + 1]) {
pairs += 1;
i += 2;
} else {
i++;
}
}
When a pair is found the index is incremented by two, this makes traversal a little bit faster. But the complexity is O(n)
anyway.
Of course you run this after the array is sorted
.
The secret is to stop reiterating. Cache data as it appears. You can use caching to reduce this into an O(nlogn) problem.
Pairs is very vague wording, so in the future a few more examples will clarify things you don't know a name to find answers for. You can use mathematics of Combinations to reduce the problem into O(n).
The wikipedia article is a bit obtuse to read, but the equation you're looking for is near the top:
n! / (k! * (n - k)!)
where !
indicates a factorial number, n
indicates the amount of items to be combined (4 1s), and k
indicates the amount of items per combination (2 for a pair). So substituting these values:
4! = 24
2! = 2
(4-2)! = 2
4!/(2!2!) = 24/4 = 6
Using this equation it can be reduced to O(n). Since factorials are used and the dataset is sorted, you can further improve performance by caching the result of the factorial call for future calls. Sorted input for a factorial function will have cache hits for nearly every lookup!
Caching may not be necessary if you are using python 3 as it uses a much more efficient algorithm to calculate factorials than python 2. Caching will reduce redundancy, however which may yield a good result on very large values.
An example of caching (memoization):
import math
class Cache(object):
memos = dict()
def factorial(self, n):
if not n in self.memos:
self.memos[n] = math.factorial(n)
return self.memos[n]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With