Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to correct misspelled names

I have a list with city names, which some of them are misspelled:

['bercelona', 'emstrdam', 'Praga']

And a list with all possible city names well spelled:

['New York', 'Amsterdam', 'Barcelona', 'Berlin', 'Prague']

I'm looking for an algorithm able to find the closest match between the names of the first and second list, and returns the first list with its well spelled names. So it should return the following list:

['Barcelona', 'Amsterdam', 'Prague']
like image 267
ebeneditos Avatar asked Dec 16 '16 21:12

ebeneditos


People also ask

Is there a spell check in Python?

Checking of spelling is a basic requirement in any text processing or analysis. The python package pyspellchecker provides us this feature to find the words that may have been mis-spelled and also suggest the possible corrections.


1 Answers

You may use built-in Ratcliff and Obershelp algorithm:

def is_similar(first, second, ratio):
    return difflib.SequenceMatcher(None, first, second).ratio() > ratio


first = ['bercelona', 'emstrdam', 'Praga']
second = ['New York', 'Amsterdam', 'Barcelona', 'Berlin', 'Prague']

result = [s for f in first for s in second if is_similar(f,s, 0.7)]
print result
['Barcelona', 'Amsterdam', 'Prague']

Where 0.7 is coefficient of similarity. It may do some tests for your case and set this value. It shows how similar are both of strings(1 - it's the same string, 0 - very different strings)

like image 102
pivanchy Avatar answered Oct 19 '22 15:10

pivanchy