I want to find out if you strings are almost similar. For example, string like 'Mohan Mehta' should match 'Mohan Mehte' and vice versa. Another example, string like 'Umesh Gupta' should match 'Umash Gupte'. Basically one string is correct and other one is a mis-spelling of it. All my strings are names of people. Any suggestions on how to achieve this. Solution does not have to be 100 percent effective.

You can use difflib.sequencematcher if you want something from the stdlib: <pre class="prettyprint"><code>from difflib import SequenceMatcher s_1 = 'Mohan Mehta' s_2 = 'Mohan Mehte' print(SequenceMatcher(a=s_1,b=s_2).ratio()) 0.909090909091 </code></pre> fuzzywuzzy is one of numerous libs that you can install, it uses the difflib module with python-Levenshtein. You should also check out the wikipage on Approximate_string_matching

Another approach is to use a "phonetic algorithm": <blockquote> A phonetic algorithm is an algorithm for indexing of words by their pronunciation. </blockquote> For example using the soundex algorithm: <pre class="prettyprint"><code>>>> import soundex >>> s = soundex.getInstance() >>> s.soundex("Umesh Gupta") 'U5213' >>> s.soundex("Umash Gupte") 'U5213' >>> s.soundex("Umesh Gupta") == s.soundex("Umash Gupte") True </code></pre>

Finding if two strings are almost similar

Tags:

python

string

regex

I want to find out if you strings are almost similar. For example, string like 'Mohan Mehta' should match 'Mohan Mehte' and vice versa. Another example, string like 'Umesh Gupta' should match 'Umash Gupte'.

Basically one string is correct and other one is a mis-spelling of it. All my strings are names of people.

Any suggestions on how to achieve this.

Solution does not have to be 100 percent effective.

941

asked Jul 26 '15 23:07

Salil Agarwal

4 Answers

You can use difflib.sequencematcher if you want something from the stdlib:

from difflib import SequenceMatcher
s_1 = 'Mohan Mehta'
s_2 = 'Mohan Mehte'
print(SequenceMatcher(a=s_1,b=s_2).ratio())
0.909090909091

fuzzywuzzy is one of numerous libs that you can install, it uses the difflib module with python-Levenshtein. You should also check out the wikipage on Approximate_string_matching

answered Oct 02 '22 08:10

Padraic Cunningham

Another approach is to use a "phonetic algorithm":

A phonetic algorithm is an algorithm for indexing of words by their pronunciation.

For example using the soundex algorithm:

>>> import soundex
>>> s = soundex.getInstance()
>>> s.soundex("Umesh Gupta")
'U5213'
>>> s.soundex("Umash Gupte")
'U5213'
>>> s.soundex("Umesh Gupta") == s.soundex("Umash Gupte")
True

answered Oct 02 '22 08:10

Related questions
                            
                                Using multiple variables in a for loop in Python
                            
                                PyTorch - How to get learning rate during training?
                            
                                Custom Neural Network Implementation on MNIST using Tensorflow 2.0?
                            
                                Macbook m1 and python libraries [closed]
                            
                                .vimrc configuration for Python
                            
                                Deleting key/value from list of dictionaries using lambda and map
                            
                                rstrip not removing newline char what am I doing wrong? [duplicate]
                            
                                Python efficiency of and vs multiple ifs
                            
                                Add page break to Reportlab Canvas object
                            
                                Django testing: Test the initial value of a form field
                            
                                Python decorator, self is mixed up [duplicate]
                            
                                Python: How can I inherit from the built-in list type?
                            
                                Find the index of an item in a list of lists
                            
                                Invalid Syntax in except handler when using comma
                            
                                Python: What is a header?
                            
                                return output of dictionary to alphabetical order
                            
                                How to fix the issue "PyPI-test not found in .pypic" when submit package to PyPI?
                            
                                How to read text files in a zipped folder in Python
                            
                                python not recognized in Windows CMD even after adding to PATH
                            
                                Time python scripts using IPython magic

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Finding if two strings are almost similar

Tags:

python

string

regex

Salil Agarwal

People also ask

4 Answers

Padraic Cunningham

Steven Kryskalla

fgregg

Steven Kay

Recent Activity

Donate For Us