I have a sorted array of int
going from x
to y
(the values of the elements are random but sorted in ascending order using qsort()
). The program receives various intervals like <10;50>
, or <50;100>
. I have the following simple for
loop to determine if the values in the array are in the set interval, if yes add one to the counter.
for(int i = 0; i < arraySize ;i++ ) {
if (points[i] >= interval1 && points[i] <= interval2){
counter++;
}
}
I need a faster way than O(n)
to search through the array, and determine if the value in points[i]
is in the set interval or not. The values can be in the millions, hence slowing down dramatically.
The elements in the array can range from 0 to 1000000000 (1e9). The intervals respectively.
Use binary search - for the input interval [i, j]
, find the index of the smallest integer that is larger than i
, find the index of the largest integer that is smaller than j
, and then return the distance between them.
ssize_t bin_search_first_larger(int arr[], size_t arr_sz, int val) {
ssize_t l = -1;
ssize_t r = arr_sz;
/* invariant: arr[l] < val && val <= arr[r] */
while (l+1 != r) {
ssize_t m = l+(r-l)/2;
if (arr[m] < val) {
l = m;
} else {
r = m;
}
}
/* l+1 == r && arr[l] < val && val <= arr[r] */
return r;
}
ssize_t bin_search_last_smaller(int arr[], size_t arr_sz, int val) {
ssize_t l = -1;
ssize_t r = arr_sz;
/* invariant: arr[l] <= val && val < arr[r] */
while (l+1 != r) {
ssize_t m = l+(r-l)/2;
if (arr[m] <= val) {
l = m;
} else {
r = m;
}
}
/* l+1 == r && arr[l] <= val && val < arr[r] */
return l;
}
ssize_t values_in(int arr[], size_t arr_sz, int x, int y) {
ssize_t i = bin_search_first_larger(arr, arr_sz, x);
ssize_t j = bin_search_last_smaller(arr, arr_sz, y);
return j-i+1;
}
The binary search code is adapted from Jon Bentley's Programming Pearls (which is well worth a read), where it is shown how binary search can be modified to return either the first occurrence or the last occurrence of a value in a sorted array with duplicates (rather than returning an arbitrary occurrence of a duplicate value). The process is similar for your use case, the difference is subtle.
Note that conceptually, it is assumed that arr[-1]
is negative infinity, and arr[N]
is positive infinity (where N
is the size of the array), but obviously, the code never attempts to access such elements.
Time complexity is O(log(N))
where N
is the size of the array, it's hard (impossible?) to get any better than that.
I ran some tests and it appears to work fine for the general case and also for edge cases (no elements in the range, or y
larger than every element, or x
smaller than every element, or both x
smaller than every element and y
larger than every element), but as you probably know this does not prove the absence of bugs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With