Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any algorithms that would find the closest match to a string from a collection of strings?

Are there any algorithms that would find the closest match to a string from a collection of strings? For example:

string_to_match = 'What color is the sky?'

strings = [
  'What colour is the sea?', 
  'What colour is the sky?', 
  'What colour is grass?', 
  'What colour is earth?'
]

answer = method_using_string_matching_algorithm(string_to_match, strings)
answer # returns strings[1] 'What colour is the sky?'
like image 727
Neil Avatar asked Oct 15 '25 15:10

Neil


1 Answers

The search terms you're looking for are "string distance algorithms" and "approximate string matching." A quick check of Google turns up interesting options such as:

  • Sift3 Distance
  • Levenshtein Distance
  • Optimal String Alignment Distance
  • Damerau-Levenshtein Distance
  • Qwerty Keyboard Distance

Some useful links include:

  • https://github.com/Kicksend/mailcheck/wiki/String-Distance-Algorithms
  • http://en.wikipedia.org/wiki/Approximate_string_matching

As of this writing, Debian-based Linux distributions also include agrep and TRE-agrep in their repositories.

like image 164
Todd A. Jacobs Avatar answered Oct 18 '25 05:10

Todd A. Jacobs