Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's wrong with this Interpolation search implementation?

This is a common C/C++ implementation of the Interpolation Search algorithm found around the Internet. However, when used with a sorted array of some 100000 integers, the mid-variable starts generating negative array-indexes, causing a Segmentation Fault. What could the problem be?

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int interpolationSearch(int sortedArray[], int toFind, int len) {
    // Returns index of toFind in sortedArray, or -1 if not found
    int low = 0;
    int high = len - 1;
    int mid;

    while (sortedArray[low] <= toFind && sortedArray[high] >= toFind) {
        mid = low + ((toFind - sortedArray[low]) * (high - low)) /
              (sortedArray[high] - sortedArray[low]);

        if (sortedArray[mid] < toFind) {
            low = mid + 1;
        } else if (sortedArray[mid] > toFind) {
            high = mid - 1;
        } else {
            return mid;
        }
    }

    if (sortedArray[low] == toFind)
        return low;
    else
        return -1; // Not found
}

int main(void) {
    srand(time(0));
    int arr[100000];
    for (int i=0; i<100000; i++) {
        arr[i] = rand()%100000;
    }

    int length = sizeof(arr)/sizeof(int);
    qsort(arr,length,sizeof(int),order);

    for (int j=0; j<10000; j++) {
        interpolationSearch(arr,rand()%100000,length);
    }
}
like image 807
Gorkamorka Avatar asked Dec 16 '22 18:12

Gorkamorka


1 Answers

The sub-expression: ((toFind - sortedArray[low]) * (high - low))

... can easily evaluate to something like: ((99999-0) * (99999-0)) == 99999^2

... which is much larger than 2^31 (== the range of 32-bit signed integers).

Once it exceeds 2^31-1, the integer will overflow into negative numbers, hence your negative indices. If it exceeds 2^32 (which it also could do), then (most likely, technically undefined) you'll lose the high-order bits and you'll end up with effectively random offsets, both positive and negative.

To avoid all of this, you need to do your math carefully to make sure none of your sub-expressions yield an integer overflow. Usually the easiest way to do this is to convert to floating-point whose range is many orders of magnitude larger than 32-bit integers.

In the final analysis, interpolation such as this for binary search is usually not worth it -- the expense of computing the interpolant is typically greater than the few extra iterations of the loop that it "saves".

like image 103
mcmcc Avatar answered Jan 23 '23 06:01

mcmcc