I have two unsorted arrays a
and b
. For every element a[i]
I need to find the number of elements b[j]
such that b[j] < a[i]
. In addition b
may contain duplicates which should not be counted. Both arrays may be very large.
I tried (for a single query x
)
public static void main(String arg[]) {
int x = 5;
int b[] = {2, 4, 3, 4, 11, 13, 17};
Arrays.sort(b);
int count = 0;
for(int i = 0; i < b.length; ++i) {
if(b[i] < x) {
if(i == 0)
++count;
else {
// this check avoids counting duplicates
if(b[i - 1] != b[i])
++count;
}
} else {
break;
}
}
System.out.println(count);
}
My problem is that this doesn't perform well enough when querying all elements of a
iteratively. What can I do to speed this up?
EDIT: given the later comments, some updates which I just put right in the beginning; leaving my first text at the bottom.
So, the core aspects here are:
OK, so about your problem. Thing is: actually, that is just "work". There is no magic there. As you have two very large arrays, working on unsorted data is an absolute no-go.
So, you start by sorting both arrays.
Then you iterate over the first array and while doing that, you also look into the second array:
int indexWithinB = 0;
int counterForCurrentA = 0; // and actually ALL values from a before
for (int i=0; i<a.length; i++) {
int currentA = a[i];
while (b[indexWithinB] < currentA) {
if (indexWithinB > 0) { // check required to avoid using 0-1
if (b[indexWithinB-1] != b[indexWithinB] { // avoid counting duplicates!
counterForCurrentA++;
}
}
indexWithinB++;
}
// while loop ended, this means: b[indexWithinB] == or > currentA
// this also means: counterForCurrentA ... should have the correct value
}
The above is obviously pseudo code. It is meant to keep you going; and it might very well be, that there are subtle errors in there. For example, as Andreas pointed out: the above needs to be enhanced to check for b.length, too. But that is left as exercise to the reader.
That is what I meant with "just work": you simply have to sit down, write testcases and refine my draft algorithm until it does the job for you. If you find it too hard to program this initially, then take a piece of paper, put down two arrays with numbers ... and do that counting manually.
Final hint: I suggest to write plenty of unit tests to test your algorithm (such stuff is perfect for unit tests); and make sure that you have all your corner cases in such tests. You want to be 100% sure that your algorithm is correct before going for your 10^5 element arrays!
And here, as promised, the original answer:
Simply spoken: iterating and counting is the most efficient way to solve this problem. So in your above case, leaving out the sorting might lead to quicker overall execution time.
The logic there is really simple: in order to know the count of numbers smaller than x ... you have to look at all of them. Thus you have to iterate the complete array (when that array is not sorted).
Thus, given your initial statement, there is no other thing than: iterate and count.
Of course, if you have to this counting multiple times ... it might be worth sorting that data initially. Because then you can use binary search, and getting that count you are looking for works without iterating all data.
And: what makes you think that iterating an array with 10^5 elements is a problem? In other words: are you just worried about a potential performance problem, or do you have a real performance problem? You see, at some point you probably had to create and fill that array. That for sure took more time (and resources) than a simple for-loop to count entries. And honestly: unless we are talking some small embedded device ... 10^5 elements ... that is close to nothing, even when using slightly outdated hardware.
Finally: when you are worried about runtime, the simple answer is: slice your input data, and use 2,4, 8, ... threads to count each slice in parallel! But as said: before writing that code, I would do some profiling be sure that you really have to spent precious development time on this. Don't solve hypothetical performance problems; focus on those that really matter to you or your users!
Comapring every item in the array with x will take you O(n) time. Sorting the array will take O(n log n), and then you can use binary search, which is O(log n), and you get a total of O(n log n). So the most efficient way is also the trivial one - just loop thru the array and compare every item with x.
public static void main(String arg[] ){
int b[]={2, 4, 3, 4, 11, 13, 17};
int x=5;
int count=0;
for(int i=0;i<b.length;i++){
if(b[i]<x){
count++;
}
}
System.out.println(count);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With