Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fast approximate string matching in a large array in ruby

In Ruby I have an array consisting of about a million strings called dictionary_array. I have another array consisting of about thousand strings called arr.

For every element in arr, I want to find an element in dictionary_array which is closest.

Iterating over every element in arr, and for each element in arr iterating over every element in dictionary_array to find the one with the minimum Levenshtein distance is O(n^2) and too slow for my purposes.

Is there a better way to solve this problem ?

like image 550
jimcgh Avatar asked Nov 11 '22 12:11

jimcgh


1 Answers

Found this interesting article by adding precompute to your question:

http://stevehanov.ca/blog/index.php?id=114

Code is in Python but should be possible to translate.

like image 175
seph Avatar answered Nov 15 '22 05:11

seph