Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect similar sounding words in Ruby

Tags:

ruby

phoneme

I'm aware of SOUNDEX and (double) Metaphone, but these don't let me test for the similarity of words as a whole - for example "Hi" sounds very similar to "Bye", but both of these methods will mark them as completely different.

Are there any libraries in Ruby, or any methods you know of, that are capable of determining the similarity between two words? (Either a boolean is/isn't similar, or numerical 40% similar)

edit: Extra bonus points if there is an easy method to 'drop in' a different dialect or language!

like image 575
JP. Avatar asked Jan 23 '23 09:01

JP.


1 Answers

I think you're describing levenshtein distance. And yes, there are gems for that. If you're into pure Ruby go for the text gem.

$ gem install text

The docs have more details, but here's the crux of it:

Text::Levenshtein.distance('test', 'test')    # => 0
Text::Levenshtein.distance('test', 'tent')    # => 1

If you're ok with native extensions...

$ gem install levenshtein

It's usage is similar. It's performance is very good. (It handles ~1000 spelling corrections per minute on my systems.)

If you need to know how similar two words are, use distance over word length.

If you want a simple similarity test, consider something like this:

Untested, but straight forward:

String.module_eval do
   def similar?(other, threshold=2)
    distance = Text::Levenshtein.distance(self, other)
    distance <= threshold
  end
end
like image 53
Levi Avatar answered Feb 16 '23 03:02

Levi