Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a simple fuzzy string matching algorithm in Python?

Tags:

python

I'm trying to find some sort of a good, fuzzy string matching algorithm. Direct matching doesn't work for me — this isn't too good because unless my strings are a 100% similar, the match fails. The Levenshtein method doesn't work too well for strings as it works on a character level. I was looking for something along the lines of word level matching e.g.

String A: The quick brown fox.

String B: The quick brown fox jumped over the lazy dog.

These should match as all words in string A are in string B.

Now, this is an oversimplified example but would anyone know a good, fuzzy string matching algorithm that works on a word level.

like image 612
Mridang Agarwalla Avatar asked May 27 '10 17:05

Mridang Agarwalla


1 Answers

I like Drew's answer.

You can use difflib to find the longest match:

>>> a = 'The quick brown fox.'
>>> b = 'The quick brown fox jumped over the lazy dog.'
>>> import difflib
>>> s = difflib.SequenceMatcher(None, a, b)
>>> s.find_longest_match(0,len(a),0,len(b))
Match(a=0, b=0, size=19) # returns NamedTuple (new in v2.6)

Or pick some minimum matching threshold. Example:

>>> difflib.SequenceMatcher(None, a, b).ratio()
0.61538461538461542
like image 139
mechanical_meat Avatar answered Sep 28 '22 17:09

mechanical_meat