What's wrong with this Interpolation search implementation?

Question

This is a common C/C++ implementation of the Interpolation Search algorithm found around the Internet. However, when used with a sorted array of some 100000 integers, the mid-variable starts generating negative array-indexes, causing a Segmentation Fault. What could the problem be?

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int interpolationSearch(int sortedArray[], int toFind, int len) {
    // Returns index of toFind in sortedArray, or -1 if not found
    int low = 0;
    int high = len - 1;
    int mid;

    while (sortedArray[low] <= toFind && sortedArray[high] >= toFind) {
        mid = low + ((toFind - sortedArray[low]) * (high - low)) /
              (sortedArray[high] - sortedArray[low]);

        if (sortedArray[mid] < toFind) {
            low = mid + 1;
        } else if (sortedArray[mid] > toFind) {
            high = mid - 1;
        } else {
            return mid;
        }
    }

    if (sortedArray[low] == toFind)
        return low;
    else
        return -1; // Not found
}

int main(void) {
    srand(time(0));
    int arr[100000];
    for (int i=0; i<100000; i++) {
        arr[i] = rand()%100000;
    }

    int length = sizeof(arr)/sizeof(int);
    qsort(arr,length,sizeof(int),order);

    for (int j=0; j<10000; j++) {
        interpolationSearch(arr,rand()%100000,length);
    }
}

mcmcc · Accepted Answer

The sub-expression: ((toFind - sortedArray[low]) * (high - low))

... can easily evaluate to something like: ((99999-0) * (99999-0)) == 99999^2

... which is much larger than 2^31 (== the range of 32-bit signed integers).

Once it exceeds 2^31-1, the integer will overflow into negative numbers, hence your negative indices. If it exceeds 2^32 (which it also could do), then (most likely, technically undefined) you'll lose the high-order bits and you'll end up with effectively random offsets, both positive and negative.

To avoid all of this, you need to do your math carefully to make sure none of your sub-expressions yield an integer overflow. Usually the easiest way to do this is to convert to floating-point whose range is many orders of magnitude larger than 32-bit integers.

In the final analysis, interpolation such as this for binary search is usually not worth it -- the expense of computing the interpolant is typically greater than the few extra iterations of the loop that it "saves".

What's wrong with this Interpolation search implementation?

Tags:

c++

c

algorithm

search

Gorkamorka

1 Answers

mcmcc

Recent Activity

Donate For Us

What's wrong with this Interpolation search implementation?

Tags:

c++

c

algorithm

search

Gorkamorka

1 Answers

mcmcc

Related questions

Recent Activity

Donate For Us