Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby gem for text comparison

I am looking for a gem that can compare two strings (in this case paragraphs of text) and be able to gauge the likelihood that they are similar in content (with perhaps only a few words rearranged, changed). I believe that SO uses something similar when users submit questions.

like image 505
Jackson Henley Avatar asked Jun 27 '12 00:06

Jackson Henley


1 Answers

I'd probably use something like Diff::LCS:

>> require "diff/lcs"
>> seq1 = "lorem ipsum dolor sit amet consequtor".split(" ")
>> seq2 = "lorem ipsum dolor amet sit consequtor".split(" ")
1.9.3-p194 :010 > Diff::LCS.diff(seq1, seq2).length
 => 2

It uses the longest common subsequence algorithm (the method for using LCS to get a diff is described on the wiki page).

like image 149
Yehuda Katz Avatar answered Oct 22 '22 21:10

Yehuda Katz