Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Levenshtein to Damerau-Levenshtein

I'm sitting here and programmnging some algorithms for my main program in Java (well the first one so far). I programmed the levenshtein algorithm just fine thanks to wiki being so nice with the pseudocode to newbeginners plus a nice tutorial :D

I then decided to upgrade to Damerau and added the extra lines but then I read that it's not DL algo but OptimalStringAlignmentDistance instead. I tried reading the actionscript code to understand what more I needed to add to make it to DL but got confused instead. I've been to different places with code looking alike to Java but they all use the wrong pseudocode too.

After spending half the day I gave up and decided to ask here. Is there anyone who can help me with upgrading this code to Damerau-Levenshtein in Java?

    public class LevensteinDistance {
        private static int Minimum(int a, int b, int c) {
            return Math.min(Math.min(a, b), c);
        }

        private static int Minimum (int a, int b) {
            return Math.min(a, b);
        }

        public static int computeLevensteinDistance(String s, String t){
            int d[][];
            int n; // length of s
            int m; // length of t
            int i; // iterates through s
            int j; // iterates through t
            char s_i; // ith character of s
            char t_j; // jth character of t
            int cost; // cost

            n = s.length ();
            m = t.length ();
            if (n == 0) {
                return m;
            }
            if (m == 0) {
                return n;
            }
            d = new int[n+1][m+1];

            for (i = 0; i <= n; i++) {
                d[i][0] = i;
            }

            for (j = 0; j <= m; j++) {
                d[0][j] = j;
            }

            for(i = 1; i <= n; i++) {
                s_i = s.charAt (i - 1);
                for(j = 1; j <= m; j++) {
                    t_j = t.charAt (j - 1);

                    if(s_i == t_j){
                        cost = 0;
                    }else{
                        cost = 1;
                    }
                    d[i][j] = Minimum(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);

                    if(i > 1 && j > 1 && s_i == t_j-1 && s_i-1 == t_j){ 
                        d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
                    }
                }
            }
        return d[n][m];
    }

    //    public static void main(String[] args0){
    //      String a = "I decided it was best to ask the forum if I was doing it right";
    //      String b = "I thought I should ask the forum if I was doing it right";
    //      System.out.println(computeLevensteinDistance(a, b));
    //    }
}

Here is the Wikipedia page for the Damerau–Levenshtein distance algorithm

like image 311
N00programmer Avatar asked May 17 '11 15:05

N00programmer


1 Answers

Your problem is in referencing the previous characters from the string in your conditional. In your original code you have:

if(i > 1 && j > 1 && s_i == t_j-1 && s_i-1 == t_j){ 
  d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
}

The problem is the values t_j-1 and s_i-1. These say the ith character of s and t minus 1, where the algorithm says you want the (ith minus 1) characters. For example if string s is "AFW" and i is 1 then

s_i - 1 = E; //the character value (s[1]='F') minus 1 = 'E'
s.charAt(i-1) = A; //i-1 = 0, s[0] = 'A'

so your conditional should read:

if(i > 1 && j > 1 && s_i == t.charAt(j-1) && s.charAt(i-1) == t_j) { 
  d[i][j] = Minimum(d[i][j], d[i-2][j-2] + cost);
}

EDIT: Unforutnately I do not understand the algorithm from reading the code, however here is a translation of the ActionScript example from the wikipedia page in Java, which gives an output that matches your example:

public static int damerauLevenshteinDistance(
      String a, String b, int alphabetLength) {
    final int INFINITY = a.length() + b.length();
    int[][] H = new int[a.length()+2][b.length()+2];  
    H[0][0] = INFINITY;
    for(int i = 0; i<=a.length(); i++) {
      H[i+1][1] = i;
      H[i+1][0] = INFINITY;
    }
    for(int j = 0; j<=b.length(); j++) {
      H[1][j+1] = j;
      H[0][j+1] = INFINITY;
    }      
    int[] DA = new int[alphabetLength];
    Arrays.fill(DA, 0);
    for(int i = 1; i<=a.length(); i++) {
      int DB = 0;
      for(int j = 1; j<=b.length(); j++) {
        int i1 = DA[b.charAt(j-1)];
        int j1 = DB;
        int d = ((a.charAt(i-1)==b.charAt(j-1))?0:1);
        if(d==0) DB = j;
        H[i+1][j+1] =
          min(H[i][j]+d,
              H[i+1][j] + 1,
              H[i][j+1]+1, 
              H[i1][j1] + (i-i1-1) + 1 + (j-j1-1));
      }
      DA[a.charAt(i-1)] = i;
    }
    return H[a.length()+1][b.length()+1];
  }

  private static int min(int ... nums) {
    int min = Integer.MAX_VALUE;
    for (int num : nums) {
      min = Math.min(min, num);
    }
    return min;
  }
like image 51
M. Jessup Avatar answered Oct 03 '22 10:10

M. Jessup