Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby compare two strings similarity percentage

Id like to compare two strings in Ruby and find their similarity

I've had a look at the Levenshtein gem but it seems this was last updated in 2008 and I can't find documentation how to use it. With some blogs suggesting its broken

I tried the text gem with Levenshtein but it gives an integer (smaller is better)

Obviously if the two strings are of variable length I run into problems with the Levenshtein Algorithm (Say comparing two names, where one has a middle name and one doesnt).

What would you suggest I do to get a percentage comparison?

Edit: Im looking for something similar to PHP's similar text

like image 525
Tarang Avatar asked Mar 22 '12 12:03

Tarang


People also ask

How do you compare similarity between two strings?

Hamming Distance, named after the American mathematician, is the simplest algorithm for calculating string similarity. It checks the similarity by comparing the changes in the number of positions between the two strings.

How do you compare two strings in Ruby?

Two strings or boolean values, are equal if they both have the same length and value. In Ruby, we can use the double equality sign == to check if two strings are equal or not. If they both have the same length and content, a boolean value True is returned. Otherwise, a Boolean value False is returned.

Can you use == when comparing strings?

You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.

How do you compare characters to a string in Ruby?

How to compare strings in Ruby. You can compare strings using “==”, but strings are case-sensitive. Because of this, it's common practice to call downcase or upcase on both strings to convert them into the same case before comparing them.


2 Answers

I think your question could do with some clarifications, but here's something quick and dirty (calculating as percentage of the longer string as per your clarification above):

def string_difference_percent(a, b)
  longer = [a.size, b.size].max
  same = a.each_char.zip(b.each_char).select { |a,b| a == b }.size
  (longer - same) / a.size.to_f
end

I'm still not sure how much sense this percent difference you are looking for makes, but this should get you started at least.

It's a bit like Levensthein distance, in that it compares the strings character by character. So if two names differ only by the middle name, they'll actually be very different.

like image 95
Michael Kohl Avatar answered Sep 21 '22 11:09

Michael Kohl


There is now a ruby gem for similar_text. https://rubygems.org/gems/similar_text It provides a similar method that compares two strings and returns a number representing the percent similarity between the two strings.

like image 39
user2837093 Avatar answered Sep 20 '22 11:09

user2837093