Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to determine values matching an interval in an array

I have a sorted array of int going from x to y (the values of the elements are random but sorted in ascending order using qsort()). The program receives various intervals like <10;50>, or <50;100>. I have the following simple for loop to determine if the values in the array are in the set interval, if yes add one to the counter.

 for(int i = 0; i < arraySize ;i++ )  {        
       if (points[i] >= interval1 && points[i] <= interval2){
            counter++;               
        }
    }

I need a faster way than O(n) to search through the array, and determine if the value in points[i] is in the set interval or not. The values can be in the millions, hence slowing down dramatically.

The elements in the array can range from 0 to 1000000000 (1e9). The intervals respectively.

like image 259
Supercan Avatar asked Oct 19 '22 20:10

Supercan


1 Answers

Use binary search - for the input interval [i, j], find the index of the smallest integer that is larger than i, find the index of the largest integer that is smaller than j, and then return the distance between them.

ssize_t bin_search_first_larger(int arr[], size_t arr_sz, int val) {
    ssize_t l = -1;
    ssize_t r = arr_sz;
    /* invariant: arr[l] < val && val <= arr[r] */
    while (l+1 != r) {
        ssize_t m = l+(r-l)/2;
        if (arr[m] < val) {
            l = m;
        } else {
            r = m;
        }
    }
    /* l+1 == r && arr[l] < val && val <= arr[r] */
    return r;
}

ssize_t bin_search_last_smaller(int arr[], size_t arr_sz, int val) {
    ssize_t l = -1;
    ssize_t r = arr_sz;
    /* invariant: arr[l] <= val && val < arr[r] */
    while (l+1 != r) {
        ssize_t m = l+(r-l)/2;
        if (arr[m] <= val) {
            l = m;
        } else {
            r = m;
        }
    }
    /* l+1 == r && arr[l] <= val && val < arr[r] */
    return l;
}

ssize_t values_in(int arr[], size_t arr_sz, int x, int y) {
    ssize_t i = bin_search_first_larger(arr, arr_sz, x);
    ssize_t j = bin_search_last_smaller(arr, arr_sz, y);
    return j-i+1;
}

The binary search code is adapted from Jon Bentley's Programming Pearls (which is well worth a read), where it is shown how binary search can be modified to return either the first occurrence or the last occurrence of a value in a sorted array with duplicates (rather than returning an arbitrary occurrence of a duplicate value). The process is similar for your use case, the difference is subtle.

Note that conceptually, it is assumed that arr[-1] is negative infinity, and arr[N] is positive infinity (where N is the size of the array), but obviously, the code never attempts to access such elements.

Time complexity is O(log(N)) where N is the size of the array, it's hard (impossible?) to get any better than that.

I ran some tests and it appears to work fine for the general case and also for edge cases (no elements in the range, or y larger than every element, or x smaller than every element, or both x smaller than every element and y larger than every element), but as you probably know this does not prove the absence of bugs.

like image 124
Filipe Gonçalves Avatar answered Oct 21 '22 14:10

Filipe Gonçalves