I have a string that is the correct spelling of a word:
FOO
I would allow someine to mistype the word in such ways:
FO, F00, F0O ,FO0
Is there a nice way to check for this ? Lower case should also be seen as correct, or convert to upper case. What ever would be the prettiest.
One approach is to calculate the edit distance between the strings. You can for example use the Levenshtein distance, or invent your own distance function that considers 0 and O more close than 0 and P, for example.
Another is to transform each word into a canonical form, and compare canonical forms. You can for example convert the string to uppercase, replace all 0s with Os, 1s with Is, etc., then remove duplicated letters.
>>> import itertools
>>> def canonical_form(s):
s = s.upper()
s = s.replace('0', 'O')
s = s.replace('1', 'I')
s = ''.join(k for k, g in itertools.groupby(s))
return s
>>> canonical_form('FO')
'FO'
>>> canonical_form('F00')
'FO'
>>> canonical_form('F0O')
'FO'
The builtin module difflib has a get_close_matches function.
You can use it like this:
>>> import difflib
>>> difflib.get_close_matches('FO', ['FOO', 'BAR', 'BAZ'])
['FOO']
>>> difflib.get_close_matches('F00', ['FOO', 'BAR', 'BAZ'])
[]
>>> difflib.get_close_matches('F0O', ['FOO', 'BAR', 'BAZ'])
['FOO']
>>> difflib.get_close_matches('FO0', ['FOO', 'BAR', 'BAZ'])
['FOO']
Notice that it doesn't match one of your cases. You could lower the cutoff parameter to get a match:
>>> difflib.get_close_matches('F00', ['FOO', 'BAR', 'BAZ'], cutoff=0.3)
['FOO']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With