I was trying out python's difflib module and I came across <code>SequenceMatcher</code>. So, I tried the following examples but couldn't understand what is happening. <pre class="prettyprint"><code>>>> SequenceMatcher(None,"abc","a").ratio() 0.5 >>> SequenceMatcher(None,"aabc","a").ratio() 0.4 >>> SequenceMatcher(None,"aabc","aa").ratio() 0.6666666666666666 </code></pre> Now, according to the ratio: <blockquote> Return a measure of the sequences' similarity as a float in the range [0, 1]. Where <code>T</code> is the total number of elements in both sequences, and <code>M</code> is the number of matches, this is <code>2.0*M / T</code>. </blockquote> so, for my cases: <ol> <li> <code>T=4</code> and <code>M=1</code> so ratio <code>2*1/4 = 0.5</code> </li> <li> <code>T=5</code> and <code>M=2</code> so ratio <code>2*2/5 = 0.8</code> </li> <li> <code>T=6</code> and <code>M=1</code> so ratio <code>2*1/6.0 = 0.33</code> </li> </ol> According to my understanding <code>T = len(aabc) + len(a)</code> and <code>M=2</code> because <code>a</code> comes twice in <code>aabc</code>. So, where am I getting wrong what am I missing.? Here is the source code of <code>SequenceMatcher.ratio()</code>

You've got the first case right. In the second case, only one <code>a</code> from <code>aabc</code> matches, so M = 1. In the third example, both <code>a</code>s match so M = 2. [P.S.: you're referring to the ancient Python 2.4 source code. The current source code is at hg.python.org.]

How does SequenceMatcher.ratio works in difflib

Tags:

python

string

string-matching

similarity

I was trying out python's difflib module and I came across SequenceMatcher. So, I tried the following examples but couldn't understand what is happening.

Click to copy

>>> SequenceMatcher(None,"abc","a").ratio()
0.5

>>> SequenceMatcher(None,"aabc","a").ratio()
0.4

>>> SequenceMatcher(None,"aabc","aa").ratio()
0.6666666666666666

Now, according to the ratio:

Return a measure of the sequences' similarity as a float in the range [0, 1]. Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T.

so, for my cases:

T=4 and M=1 so ratio 2*1/4 = 0.5
T=5 and M=2 so ratio 2*2/5 = 0.8
T=6 and M=1 so ratio 2*1/6.0 = 0.33

According to my understanding T = len(aabc) + len(a) and M=2 because a comes twice in aabc.

So, where am I getting wrong what am I missing.?

Here is the source code of SequenceMatcher.ratio()

247

asked Sep 15 '12 10:09

RanRag

1 Answers

You've got the first case right. In the second case, only one a from aabc matches, so M = 1. In the third example, both as match so M = 2.

[P.S.: you're referring to the ancient Python 2.4 source code. The current source code is at hg.python.org.]

178

answered Oct 25 '22 01:10

Fred Foo

Related questions
                            
                                interprocess C# python real time
                            
                                How to get a build a form with repeated elements well
                            
                                Python logging extremely slow on Linux server... but fast on Linux development VM?
                            
                                How can I fill a matplotlib grid?
                            
                                How does Pyramid's add_static_view work?
                            
                                Flask-WTForms can't find WTForms in my project directory
                            
                                Installing Python binary modules to a custom location in Windows
                            
                                Reference class variable in a comprehension of another class variable
                            
                                Inserting pyodbc.Binary data (BLOB) into SQL Server image column
                            
                                How to run Python script with one icon click?
                            
                                Simple remote process monitoring with Python
                            
                                Ignore exceptions thrown and caught inside a library
                            
                                Adding a badge to an icon in Python on Windows/OSX/Linux
                            
                                Display details of importer
                            
                                Why does my ttk.Treeview click handler return the wrong item on tree.focus()?
                            
                                Conditional pip install requirements on Heroku for Django app
                            
                                Mako escaping issue within Pyramid
                            
                                Convert tabbed text to html unordered list?
                            
                                What is the best drop-in replacement for numpy.interp if I want the null interpolation (piecewise constant)?
                            
                                How to change function name dynamically in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With