Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing strings in python to find errors

I have a string that is the correct spelling of a word:

FOO

I would allow someine to mistype the word in such ways:

FO, F00, F0O ,FO0

Is there a nice way to check for this ? Lower case should also be seen as correct, or convert to upper case. What ever would be the prettiest.

like image 462
Harry Avatar asked Jan 17 '26 20:01

Harry


2 Answers

One approach is to calculate the edit distance between the strings. You can for example use the Levenshtein distance, or invent your own distance function that considers 0 and O more close than 0 and P, for example.

Another is to transform each word into a canonical form, and compare canonical forms. You can for example convert the string to uppercase, replace all 0s with Os, 1s with Is, etc., then remove duplicated letters.

>>> import itertools
>>> def canonical_form(s):
        s = s.upper()
        s = s.replace('0', 'O')
        s = s.replace('1', 'I')
        s = ''.join(k for k, g in itertools.groupby(s))
        return s
>>> canonical_form('FO')
'FO'
>>> canonical_form('F00')
'FO'
>>> canonical_form('F0O')
'FO'
like image 133
Mark Byers Avatar answered Jan 20 '26 10:01

Mark Byers


The builtin module difflib has a get_close_matches function.

You can use it like this:

>>> import difflib
>>> difflib.get_close_matches('FO', ['FOO', 'BAR', 'BAZ'])
['FOO']
>>> difflib.get_close_matches('F00', ['FOO', 'BAR', 'BAZ'])
[]
>>> difflib.get_close_matches('F0O', ['FOO', 'BAR', 'BAZ'])
['FOO']
>>> difflib.get_close_matches('FO0', ['FOO', 'BAR', 'BAZ'])
['FOO']

Notice that it doesn't match one of your cases. You could lower the cutoff parameter to get a match:

>>> difflib.get_close_matches('F00', ['FOO', 'BAR', 'BAZ'], cutoff=0.3)
['FOO']
like image 31
jterrace Avatar answered Jan 20 '26 10:01

jterrace



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!