I figure this problem is easier than just a regular spell checker since the list of U.S cities is small compared to all known English words.
Anyhow, here's the problem: I have text files with full of city names; some of which are spelled correctly and some which aren't.
What kind of algorithm can I use to correct all the misspellings of city names?
Do you actually need to correct the misspellings or just flag them as with a normal spell checker? If the latter, you just need to obtain a list of correct spellings and make sure each name is the same as one in your list.
If you want to actually correct them, you probably want to use the concept of edit distance to compare the similarity of misspelled strings to those in your reference list. Then you can replace the misspelled word with the closest match. You may also want to handle the possibility that the intended city is not in your list.
The Levenshtein distance Wikipedia article is another good resource.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With