I have a list of words
list = ['car', 'animal', 'house', 'animation']
and I want to compare every list item with a string str1
and the output should be the most similar word. Example: If str1
would be anlmal
then animal
is the most similar word. How can I do this in python? Usually the words I have in my list are good distinguishable from each other.
Similarity of strings is being checked on the criteria of frequency difference of each character which should be greater than a threshold here represented by K. Explanation : 'a' occurs 4 times in str1, and 2 times in str2, 4 – 2 = 2, in range, similarly, all chars in range, hence true.
Use difflib:
difflib.get_close_matches(word, ['car', 'animal', 'house', 'animation'])
As you can see from perusing the source, the "close" matches are sorted from best to worst.
>>> import difflib
>>> difflib.get_close_matches('anlmal', ['car', 'animal', 'house', 'animation'])
['animal']
I checked difflib.get_close_matches(), but it didn't work for me correctly. I write here a robust solution, use as:
closest_match, closest_match_idx = find_closet_match(test_str, list2check)
def find_closet_match(test_str, list2check):
scores = {}
for ii in list2check:
cnt = 0
if len(test_str)<=len(ii):
str1, str2 = test_str, ii
else:
str1, str2 = ii, test_str
for jj in range(len(str1)):
cnt += 1 if str1[jj]==str2[jj] else 0
scores[ii] = cnt
scores_values = numpy.array(list(scores.values()))
closest_match_idx = numpy.argsort(scores_values, axis=0, kind='quicksort')[-1]
closest_match = numpy.array(list(scores.keys()))[closest_match_idx]
return closest_match, closest_match_idx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With