Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Measure the distance between two strings with Ruby?

Can I measure the distance between two strings with Ruby?

I.e.:

compare('Test', 'est') # Returns 1
compare('Test', 'Tes') # Returns 1
compare('Test', 'Tast') # Returns 1
compare('Test', 'Taste') # Returns 2
compare('Test', 'tazT') # Returns 5
like image 302
Caio Tarifa Avatar asked May 01 '13 17:05

Caio Tarifa


People also ask

How do you calculate Hamming distance in a string?

To calculate the Hamming distance, you simply count the number of bits where two same-length messages differ. An example of Hamming distance 1 is the distance between 1101 and 1001 . If you increase the distance to 2 , we can give as an example 1001 and 1010 .

What is hamming and levenshtein distance?

The Hamming distance is the number of positions at which the corresponding symbols in the two strings are different. The Levenshtein distance between two strings is no greater than the sum of their Levenshtein distances from a third string (triangle inequality).


4 Answers

I found this for you:

def levenshtein_distance(s, t)
  m = s.length
  n = t.length
  return m if n == 0
  return n if m == 0
  d = Array.new(m+1) {Array.new(n+1)}

  (0..m).each {|i| d[i][0] = i}
  (0..n).each {|j| d[0][j] = j}
  (1..n).each do |j|
    (1..m).each do |i|
      d[i][j] = if s[i-1] == t[j-1]  # adjust index into string
                  d[i-1][j-1]       # no operation required
                else
                  [ d[i-1][j]+1,    # deletion
                    d[i][j-1]+1,    # insertion
                    d[i-1][j-1]+1,  # substitution
                  ].min
                end
    end
  end
  d[m][n]
end

[ ['fire','water'], ['amazing','horse'], ["bamerindos", "giromba"] ].each do |s,t|
  puts "levenshtein_distance('#{s}', '#{t}') = #{levenshtein_distance(s, t)}"
end

That's awesome output: =)

levenshtein_distance('fire', 'water') = 4
levenshtein_distance('amazing', 'horse') = 7
levenshtein_distance('bamerindos', 'giromba') = 9

Source: http://rosettacode.org/wiki/Levenshtein_distance#Ruby

like image 75
Hugo Demiglio Avatar answered Oct 01 '22 07:10

Hugo Demiglio


Much easier and fast due to native C binding:

gem install levenshtein-ffi
gem install levenshtein

require 'levenshtein'

Levenshtein.normalized_distance string1, string2, threshold

http://rubygems.org/gems/levenshtein http://rubydoc.info/gems/levenshtein/0.2.2/frames

like image 33
Michael Franzl Avatar answered Oct 01 '22 06:10

Michael Franzl


There is an utility method in Rubygems that actually should be public but it's not, anyway:

require "rubygems/text"
ld = Class.new.extend(Gem::Text).method(:levenshtein_distance)

p ld.call("asd", "sdf") => 2
like image 24
Nakilon Avatar answered Oct 01 '22 06:10

Nakilon


Much simpler, I'm a Ruby show-off at times...

# Levenshtein distance, translated from wikipedia pseudocode by ross

def lev s, t
  return t.size if s.empty?
  return s.size if t.empty?
  return [ (lev s.chop, t) + 1,
           (lev s, t.chop) + 1,
           (lev s.chop, t.chop) + (s[-1, 1] == t[-1, 1] ? 0 : 1)
       ].min
end
like image 17
DigitalRoss Avatar answered Oct 01 '22 07:10

DigitalRoss