import difflib
a='abcd'
b='ab123'
seq=difflib.SequenceMatcher(a=a.lower(),b=b.lower())
seq=difflib.SequenceMatcher(a,b)
d=seq.ratio()*100
print d
I used the above code but obtained output is 0.0. How can I get a valid answer?
Hamming Distance, named after the American mathematician, is the simplest algorithm for calculating string similarity. It checks the similarity by comparing the changes in the number of positions between the two strings.
The simplest way to check if two strings are equal in Python is to use the == operator. And if you are looking for the opposite, then != is what you need. That's it!
You forgot the first parameter to SequenceMatcher.
>>> import difflib
>>>
>>> a='abcd'
>>> b='ab123'
>>> seq=difflib.SequenceMatcher(None, a,b)
>>> d=seq.ratio()*100
>>> print d
44.4444444444
http://docs.python.org/library/difflib.html
From the docs:
The SequenceMatcher class has this constructor:
class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)
The problem in your code is that by doing
seq=difflib.SequenceMatcher(a,b)
you are passing a
as value for isjunk
and b
as value for a
, leaving the default ''
value for b
. This results in a ratio of 0.0
.
One way to overcome this (already mentioned by Lennart) is to explicitly pass None
as extra first parameter so all the keyword arguments get assigned the correct values.
However I just found, and wanted to mention another solution, that doesn't touch the isjunk
argument but uses the set_seqs()
method to specify the different sequences.
>>> import difflib
>>> a = 'abcd'
>>> b = 'ab123'
>>> seq = difflib.SequenceMatcher()
>>> seq.set_seqs(a.lower(), b.lower())
>>> d = seq.ratio()*100
>>> print d
44.44444444444444
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With