Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching incorrectly spelt words with correct ones in python

Tags:

python

regex

I'm building an app that gets incoming SMSs, then based on a keyword, it looks to see if that keyword is associated with any campaigns that it is running. The way I'm doing it now is to load a list of keywords and possible spelling combinations, then when the SMS comes in, I look through all keywords and combinations to see if there is a match.

How would you do this not using this method, but by actually looking for words that might match another word.

Let's say the correct spelling is HAMSTER, normally I would give the campaign alternatives like HMSTER HIMSTER HAMSTAR HAMSTR HAMSTIR etc.

Is there a smart way of doing this?

HAMSTER

"hamstir".compare_to("hamster") ? match

EDIT:

How about 2 words? Say we know there are two words that need to match in the SMS:

correct for first word = THE FIRST WORD

correct for second word = AND SECOND WORD

SMS = FIRST WORD SECOND

EDIT:

Ideally people should SMS the words comma seperated, that whay I would know where to split and look for the words.

But what if they dont, like :

UNIQUE KEYWORD SECOND PARAMATER

How would I tell where the words split? The first word might be 3 words long and the second 3 or 1 or 2 etc.

In these examples, how would you use the techniques below to find the two words ?

Would you look twice ? one for each needed parameter or keyword?

like image 222
Harry Avatar asked Dec 01 '22 23:12

Harry


1 Answers

The simplest solution is to use the difflib package, which has a get_close_matches function for approximate string matching:

import difflib
difflib.get_close_matches(word, possibilities)
like image 51
David Robinson Avatar answered Dec 05 '22 02:12

David Robinson