Id like to compare two strings in Ruby and find their similarity
I've had a look at the Levenshtein
gem but it seems this was last updated in 2008 and I can't find documentation how to use it. With some blogs suggesting its broken
I tried the text
gem with Levenshtein but it gives an integer (smaller is better)
Obviously if the two strings are of variable length I run into problems with the Levenshtein Algorithm (Say comparing two names, where one has a middle name and one doesnt).
What would you suggest I do to get a percentage comparison?
Edit: Im looking for something similar to PHP's similar text
Hamming Distance, named after the American mathematician, is the simplest algorithm for calculating string similarity. It checks the similarity by comparing the changes in the number of positions between the two strings.
Two strings or boolean values, are equal if they both have the same length and value. In Ruby, we can use the double equality sign == to check if two strings are equal or not. If they both have the same length and content, a boolean value True is returned. Otherwise, a Boolean value False is returned.
You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.
How to compare strings in Ruby. You can compare strings using “==”, but strings are case-sensitive. Because of this, it's common practice to call downcase or upcase on both strings to convert them into the same case before comparing them.
I think your question could do with some clarifications, but here's something quick and dirty (calculating as percentage of the longer string as per your clarification above):
def string_difference_percent(a, b)
longer = [a.size, b.size].max
same = a.each_char.zip(b.each_char).select { |a,b| a == b }.size
(longer - same) / a.size.to_f
end
I'm still not sure how much sense this percent difference you are looking for makes, but this should get you started at least.
It's a bit like Levensthein distance, in that it compares the strings character by character. So if two names differ only by the middle name, they'll actually be very different.
There is now a ruby gem for similar_text. https://rubygems.org/gems/similar_text
It provides a similar
method that compares two strings and returns a number representing the percent similarity between the two strings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With