How can I tell difflib.get_close_matches() to ignore case? I have a dictionary which has a defined format which includes capitalisation. However, the test string might have full capitalisation or no capitalisation, and these should be equivalent. The results need to be properly capitalised, however, so I can't use a modified dictionary.
import difflib
names = ['Acacia koa A.Gray var. latifolia (Benth.) H.St.John',
'Acacia koa A.Gray var. waianaeensis H.St.John',
'Acacia koaia Hillebr.',
'Acacia kochii W.Fitzg. ex Ewart & Jean White',
'Acacia kochii W.Fitzg.']
s = 'Acacia kochi W.Fitzg.'
# base case: proper capitalisation
print(difflib.get_close_matches(s,names,1,0.9))
# this should be equivalent from the perspective of my program
print(difflib.get_close_matches(s.upper(),names,1,0.9))
# this won't work because of the dictionary formatting
print(difflib.get_close_matches(s.upper().capitalize(),names,1,0.9))
Output:
['Acacia kochii W.Fitzg.']
[]
[]
Working code:
Based on Hugh Bothwell's answer, I have modified the code as follows to get a working solution (which should also work when more than one result is returned):
import difflib
names = ['Acacia koa A.Gray var. latifolia (Benth.) H.St.John',
'Acacia koa A.Gray var. waianaeensis H.St.John',
'Acacia koaia Hillebr.',
'Acacia kochii W.Fitzg. ex Ewart & Jean White',
'Acacia kochii W.Fitzg.']
test = {n.lower():n for n in names}
s1 = 'Acacia kochi W.Fitzg.' # base case
s2 = 'ACACIA KOCHI W.FITZG.' # test case
results = [test[r] for r in difflib.get_close_matches(s1.lower(),test,1,0.9)]
results += [test[r] for r in difflib.get_close_matches(s2.lower(),test,1,0.9)]
print results
Output:
['Acacia kochii W.Fitzg.', 'Acacia kochii W.Fitzg.']
I don't see any quick way to make difflib do case-insensitive comparison.
The quick-and-dirty solution seems to be
make a function that converts the string to some canonical form (for example: upper case, single spaced, no punctuation)
use that function to make a dict of {canonical string: original string} and a list of [canonical string]
run .get_close_matches against the canonical-string list, then plug the results through the dict to get the original strings back
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With