Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fuzzy String Comparison

What I am striving to complete is a program which reads in a file and will compare each sentence according to the original sentence. The sentence which is a perfect match to the original will receive a score of 1 and a sentence which is the total opposite will receive a 0. All other fuzzy sentences will receive a grade in between 1 and 0.

I am unsure which operation to use to allow me to complete this in Python 3.

I have included the sample text in which the Text 1 is the original and the other preceding strings are the comparisons.

Text: Sample

Text 1: It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats.

Text 20: It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines // Should score high point but not 1

Text 21: It was a murky and tempestuous night. I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines // Should score lower than text 20

Text 22: I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines. It was a murky and tempestuous night. // Should score lower than text 21 but NOT 0

Text 24: It was a dark and stormy night. I was not alone. I was not sitting on a red chair. I had three cats. // Should score a 0!

like image 519
jacksonstephenc Avatar asked Apr 30 '12 11:04

jacksonstephenc


People also ask

What is fuzzy matching example?

Fuzzy Matching (also called Approximate String Matching) is a technique that helps identify two elements of text, strings, or entries that are approximately similar but are not exactly the same. For example, let's take the case of hotels listing in New York as shown by Expedia and Priceline in the graphic below.

How do fuzzy matches work?

Fuzzy matching is a technique used in computer-assisted translation as a special case of record linkage. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a database of previous translations.

What is fuzzy name matching?

What is fuzzy name matching? Fuzzy matching assigns a probability to a match between 0.0 and 1.0 based on linguistic and statistical methods instead of just choosing either 1 (true) or 0 (false). As a result, names Robert and Bob can be a match with high probability even though they're not identical.


1 Answers

There is a package called fuzzywuzzy. Install via pip:

pip install fuzzywuzzy 

Simple usage:

>>> from fuzzywuzzy import fuzz >>> fuzz.ratio("this is a test", "this is a test!")     96 

The package is built on top of difflib. Why not just use that, you ask? Apart from being a bit simpler, it has a number of different matching methods (like token order insensitivity, partial string matching) which make it more powerful in practice. The process.extract functions are especially useful: find the best matching strings and ratios from a set. From their readme:

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")     100 

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")     90 >>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")     100 

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")     84 >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")     100 

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"] >>> process.extract("new york jets", choices, limit=2)     [('New York Jets', 100), ('New York Giants', 78)] >>> process.extractOne("cowboys", choices)     ("Dallas Cowboys", 90) 
like image 147
congusbongus Avatar answered Oct 04 '22 02:10

congusbongus