Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the similarity metric between two strings

How do I get the probability of a string being similar to another string in Python?

I want to get a decimal value like 0.9 (meaning 90%) etc. Preferably with standard Python and library.

e.g.

similar("Apple","Appel") #would have a high prob.  similar("Apple","Mango") #would have a lower prob. 
like image 372
tenstar Avatar asked Jun 30 '13 07:06

tenstar


People also ask

How do you find the similarity between two strings?

Hamming Distance, named after the American mathematician, is the simplest algorithm for calculating string similarity. It checks the similarity by comparing the changes in the number of positions between the two strings.

How do you check for similar strings in Python?

Comparing strings using the == and != The simplest way to check if two strings are equal in Python is to use the == operator. And if you are looking for the opposite, then != is what you need. That's it!

What is string similarity search?

Abstract: String similarity search is a fundamental query that has been widely used for DNA sequencing, error-tolerant query autocompletion, and data cleaning needed in database, data warehouse, and data mining.


2 Answers

There is a built in.

from difflib import SequenceMatcher  def similar(a, b):     return SequenceMatcher(None, a, b).ratio() 

Using it:

>>> similar("Apple","Appel") 0.8 >>> similar("Apple","Mango") 0.0 
like image 153
Inbar Rose Avatar answered Oct 02 '22 22:10

Inbar Rose


I think maybe you are looking for an algorithm describing the distance between strings. Here are some you may refer to:

  1. Hamming distance
  2. Levenshtein distance
  3. Damerau–Levenshtein distance
  4. Jaro–Winkler distance
like image 23
hbprotoss Avatar answered Oct 02 '22 23:10

hbprotoss