Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I compute a difference between two strings?

I want to create a function in Delphi that computes different levels of two strings. If two strings are equal (ignoring case), then it should return 0, but if they are not equal, it should return the number of different characters. This function can be very useful for checking spelling.

function GetDiffStringLevel(S1,S2:string):Integer;
begin
  if SameText(S1,S2) then Exit(0);
  // i want get different chars count
end

samples code:

Diff:=GetDiffStringLevel('Hello','Hello');// Diff:=0;
Diff:=GetDiffStringLevel('Hello','2Hello');// Diff:=1;
Diff:=GetDiffStringLevel('Hello','H2ello');// Diff:=1;
Diff:=GetDiffStringLevel('Hello','Hello W');// Diff:=2;
Diff:=GetDiffStringLevel('Hello','World');// Diff:=6; or 5
like image 821
MajidTaheri Avatar asked May 14 '12 08:05

MajidTaheri


2 Answers

Fast and compact implementation.

About 3 times as fast as smasher's implementation with normal strings. More than 100 times as fast if one of the strings is empty.

Smasher's function is case insensitive though, which can be useful as well.

function LevenshteinDistance(const s, t: string): integer;inline;
var
  d   : array of array of integer;
  n, m, i, j : integer;
begin
  n := length(s);
  m := length(t);
  if n = 0 then Exit(m);
  if m = 0 then Exit(n);

  SetLength(d, n + 1, m + 1);
  for i := 0 to n do d[i, 0] := i;
  for j := 0 to m do d[0, j] := j;

  for i := 1 to n do
    for j := 1 to m do
      d[i, j] := Min(Min(d[i-1, j]+1, d[i,j-1]+1), d[i-1,j-1]+Integer(s[i] <> t[j]));

  Result := d[n, m];
end;

Note: the inline directive reduces the execution time to less than 70% on my machine, but only for the win32 target platform. If you compile to 64bits (Delphi XE2), inlining actually makes it a tiny bit slower.

like image 62
Wouter van Nifterick Avatar answered Sep 23 '22 16:09

Wouter van Nifterick


What you want is known as the Levenshtein distance (the minimum number of edits to transform one string into the other, where an edit is either a character insertion, character deletion or character substitution). The wikipedia site has a pseudo code implementation.

Delphi implementation:

function LevenshteinDistance(String1 : String; String2 : String) : Integer;

var
  Length1, Length2      : Integer;
  WorkMatrix            : array of array of Integer;
  I, J                  : Integer;
  Cost                  : Integer;
  Val1, Val2, Val3      : Integer;

begin
String1 := TCharacter.ToUpper (String1);
String2 := TCharacter.ToUpper (String2);
Length1 := Length (String1);
Length2 := Length (String2);
SetLength (WorkMatrix, Length1+1, Length2+1);
for I := 0 to Length1 do
  WorkMatrix [I, 0] := I;
for J := 0 to Length2 do
  WorkMatrix [0, J] := J;
for I := 1 to Length1 do
  for J := 1 to Length2 do
    begin
    if (String1 [I] = String2 [J]) then
      Cost := 0
    else
      Cost := 1;
    Val1 := WorkMatrix [I-1, J] + 1;
    Val2 := WorkMatrix [I, J-1] + 1;
    Val3 := WorkMatrix[I-1, J-1] + Cost;
    if (Val1 < Val2) then
      if (Val1 < Val3) then
        WorkMatrix [I, J] := Val1
      else
        WorkMatrix [I, J] := Val3
    else
      if (Val2 < Val3) then
        WorkMatrix [I, J] := Val2
      else
        WorkMatrix [I, J] := Val3;
    end;
Result := WorkMatrix [Length1, Length2];
end;
like image 38
jpfollenius Avatar answered Sep 21 '22 16:09

jpfollenius