For those of you not familiar with interpolation search, it is method to search for a value in a sorted array that is potentially faster than binary search. You look at the first and last element and (assuming that the contents of the array are uniformly distributed) linearly interpolate to predict the location. For example: we have an array of length 100 with array[0]=0 and array[99]=99. If we are looking for 80, it is intuitive to try array[80] over array[50], and if the array is close to uniformly distributed, the expected runtime is reduced to <code>log(log(N))</code> For numbers, the location to check is defined by the equation: <code>low + ((toFind - sortedArray[low]) * (high - low + 1)) / (sortedArray[high] - sortedArray[low])</code>. A common example used to show off the intuitive nature of interpolation search is: imagine trying to find the word 'yellow' in a dictionary. You wouldn't use binary search and go to the half way point. Rather, you would go to the expected location. Humans can naturally linearly interpolate strings, but I can't figure out how code it up. How do we linearly interpolate strings?

To find the "distance" between two strings, a simple method would be to look at the first letter that is different between them and assign a numeric value to each, then take the difference. For example, the distance from "a" to "y" would be 24 and the distance from "y" to "z" would be 1, if each letter were assigned a value equal to its position in the alphabet. A better performing method would go through a dictionary to weight the various letters by how common they are in actual words. Another refinement would be to look at two characters - "aa" is farther from "bz" than "az" is from "ba", for example. Going beyond two characters wouldn't buy you much. The reason this method isn't more popular is that it complicates the binary search algorithm for not a lot of gain. If you were to time it you might even find that standard binary search is faster; what you gain in fewer comparisons you lose in the complexity of determining distances. Also note that the worst-case performance of this algorithm is worse than a binary search. Consider for example searching for "ae" in the list of "aa","ab","ac","ad","ae","zz" - the outlier "zz" is going to bias the search so that it's always trying the beginning of the search range. It degrades to O(n) under these conditions.

Interpolation search on strings

Tags:

arrays

string

search

sorted

interpolation

For those of you not familiar with interpolation search, it is method to search for a value in a sorted array that is potentially faster than binary search. You look at the first and last element and (assuming that the contents of the array are uniformly distributed) linearly interpolate to predict the location.

For example: we have an array of length 100 with array[0]=0 and array[99]=99. If we are looking for 80, it is intuitive to try array[80] over array[50], and if the array is close to uniformly distributed, the expected runtime is reduced to log(log(N))

For numbers, the location to check is defined by the equation: low + ((toFind - sortedArray[low]) * (high - low + 1)) / (sortedArray[high] - sortedArray[low]).

A common example used to show off the intuitive nature of interpolation search is: imagine trying to find the word 'yellow' in a dictionary. You wouldn't use binary search and go to the half way point. Rather, you would go to the expected location.

Humans can naturally linearly interpolate strings, but I can't figure out how code it up. How do we linearly interpolate strings?

872

asked Sep 07 '10 18:09

user108088

1 Answers

To find the "distance" between two strings, a simple method would be to look at the first letter that is different between them and assign a numeric value to each, then take the difference.

For example, the distance from "a" to "y" would be 24 and the distance from "y" to "z" would be 1, if each letter were assigned a value equal to its position in the alphabet.

A better performing method would go through a dictionary to weight the various letters by how common they are in actual words.

Another refinement would be to look at two characters - "aa" is farther from "bz" than "az" is from "ba", for example. Going beyond two characters wouldn't buy you much.

The reason this method isn't more popular is that it complicates the binary search algorithm for not a lot of gain. If you were to time it you might even find that standard binary search is faster; what you gain in fewer comparisons you lose in the complexity of determining distances.

Also note that the worst-case performance of this algorithm is worse than a binary search. Consider for example searching for "ae" in the list of "aa","ab","ac","ad","ae","zz" - the outlier "zz" is going to bias the search so that it's always trying the beginning of the search range. It degrades to O(n) under these conditions.

165

answered Sep 19 '22 15:09

Mark Ransom

Related questions
                            
                                Iterating over an array of objects, summing values with the same index, and returning a new array of objects
                            
                                How to push into an array of object using the spread operator to a specific element
                            
                                Python TypeError : only integer scalar arrays can be converted to a scalar index
                            
                                Functional way to Insert a value between all the elements inside an array
                            
                                Remove adjacent duplicates on array
                            
                                How to check for deeply nested props
                            
                                How do I get all objects in a nested array after performing a calculation in JavaScript?
                            
                                How to create Array with range in PostgreSQL
                            
                                How to convert array of objects to single object which has dynamic key in typescript
                            
                                Julia: array of arrays with different types
                            
                                How to check if a python object is a numpy ndarray
                            
                                How to find inflection point in python?
                            
                                How to calculate the size of blocks of values in a list?
                            
                                PHP's SPL: Do its interfaces involving arrays cover all array properties?
                            
                                Array Recursion
                            
                                How can i create array of my class with default constructor?
                            
                                How do I dynamically allocate a 2D array of structs?
                            
                                How can I parse a CSV into array with first value as key?
                            
                                delete function in C++
                            
                                PHP: get array element

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With