This is a common C/C++ implementation of the Interpolation Search algorithm found around the Internet. However, when used with a sorted array of some 100000 integers, the mid-variable starts generating negative array-indexes, causing a Segmentation Fault. What could the problem be?
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int interpolationSearch(int sortedArray[], int toFind, int len) {
// Returns index of toFind in sortedArray, or -1 if not found
int low = 0;
int high = len - 1;
int mid;
while (sortedArray[low] <= toFind && sortedArray[high] >= toFind) {
mid = low + ((toFind - sortedArray[low]) * (high - low)) /
(sortedArray[high] - sortedArray[low]);
if (sortedArray[mid] < toFind) {
low = mid + 1;
} else if (sortedArray[mid] > toFind) {
high = mid - 1;
} else {
return mid;
}
}
if (sortedArray[low] == toFind)
return low;
else
return -1; // Not found
}
int main(void) {
srand(time(0));
int arr[100000];
for (int i=0; i<100000; i++) {
arr[i] = rand()%100000;
}
int length = sizeof(arr)/sizeof(int);
qsort(arr,length,sizeof(int),order);
for (int j=0; j<10000; j++) {
interpolationSearch(arr,rand()%100000,length);
}
}
The sub-expression: ((toFind - sortedArray[low]) * (high - low))
... can easily evaluate to something like: ((99999-0) * (99999-0)) == 99999^2
... which is much larger than 2^31 (== the range of 32-bit signed integers).
Once it exceeds 2^31-1, the integer will overflow into negative numbers, hence your negative indices. If it exceeds 2^32 (which it also could do), then (most likely, technically undefined) you'll lose the high-order bits and you'll end up with effectively random offsets, both positive and negative.
To avoid all of this, you need to do your math carefully to make sure none of your sub-expressions yield an integer overflow. Usually the easiest way to do this is to convert to floating-point whose range is many orders of magnitude larger than 32-bit integers.
In the final analysis, interpolation such as this for binary search is usually not worth it -- the expense of computing the interpolant is typically greater than the few extra iterations of the loop that it "saves".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With