I was trying out python's difflib module and I came across SequenceMatcher
. So, I tried the following examples but couldn't understand what is happening.
>>> SequenceMatcher(None,"abc","a").ratio()
0.5
>>> SequenceMatcher(None,"aabc","a").ratio()
0.4
>>> SequenceMatcher(None,"aabc","aa").ratio()
0.6666666666666666
Now, according to the ratio:
Return a measure of the sequences' similarity as a float in the range [0, 1]. Where
T
is the total number of elements in both sequences, andM
is the number of matches, this is2.0*M / T
.
so, for my cases:
T=4
and M=1
so ratio 2*1/4 = 0.5
T=5
and M=2
so ratio 2*2/5 = 0.8
T=6
and M=1
so ratio 2*1/6.0 = 0.33
According to my understanding T = len(aabc) + len(a)
and M=2
because a
comes twice in aabc
.
So, where am I getting wrong what am I missing.?
Here is the source code of SequenceMatcher.ratio()
SequenceMatcher is a class that is available in the difflib Python package. The difflib module provides classes and functions for comparing sequences. It can be used to compare files and can produce information about file differences in various formats. This class can be used to compare two input sequences or strings.
SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching".
Difflib is a Python module that contains several easy-to-use functions and classes that allow users to compare sets of data. The module presents the results of these sequence comparisons in a human-readable format, utilizing deltas to display the differences more cleanly.
This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce information about file differences in various formats, including HTML and context and unified diffs.
You've got the first case right. In the second case, only one a
from aabc
matches, so M = 1. In the third example, both a
s match so M = 2.
[P.S.: you're referring to the ancient Python 2.4 source code. The current source code is at hg.python.org.]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With