Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: how to compare two strings in order to obtain the portions where they differ?

I would like to learn a way to obtain the portions where two strings differ.

Suppose I have these two strings:

String s1 = "x4.printString(\"Bianca.()\").y1();";
String s2 = "sb.printString(\"Bianca.()\").length();";

I would like this output: ["x4", "y1", "sb", "length"] coming from a method receiving s1and s2as arguments.

I have looked for something like this in other posts, but I have only found links to StringUtils.difference(String first, String second).

But this method returns the second string from the index where it begins to differ from the first one.
I really don't know where to start and any advice would be very appreciated.

UPDATE Following @aUserHimself advises, I managed to obtain all common subsequences among the two strings, but these subsequences come out like a unique String.

This is my code now:

private static int[][] lcs(String s, String t) {
    int m, n;
    m = s.length();
    n = t.length();
    int[][] table = new int[m+1][n+1];
    for (int i=0; i < m+1; i++)
        for (int j=0; j<n+1; j++)
            table[i][j] = 0;
    for (int i = 1; i < m+1; i++)
        for (int j = 1; j < n+1; j++)
            if (s.charAt(i-1) == t.charAt(j-1))
                table[i][j] = table[i-1][j-1] + 1;
            else
                table[i][j] = Math.max(table[i][j-1], table[i-1][j]);
    return table;
}

private static List<String> backTrackAll(int[][]table, String s, String t, int m, int n){
    List<String> result = new ArrayList<>();
    if (m == 0 || n == 0) {
        result.add("");
        return result;
    }
    else
        if (s.charAt(m-1) == t.charAt(n-1)) {
            for (String sub : backTrackAll(table, s, t, m - 1, n - 1))
                result.add(sub + s.charAt(m - 1));
            return result;
        }
        else {
            if (table[m][n - 1] >= table[m - 1][n])
                result.addAll(backTrackAll(table, s, t, m, n - 1));
            else
                result.addAll(backTrackAll(table, s, t, m - 1, n));
            return result;
        }
}

private List<String> getAllSubsequences(String s, String t){
    return backTrackAll(lcs(s, t), s, t, s.length(), t.length());
}



Calling getAllSubsequences on these two strings:

String s1 = "while (x1 < 5)"
String s2 = "while (j < 5)"

I receive this string: ["while ( < 5)"] not ["while (", " < 5)"] as I would like to obtain. I am not understanding where I am doing wrong.

like image 657
delca85 Avatar asked Oct 29 '22 10:10

delca85


2 Answers

Find the longest common subsequence between two strings. After that you can use indexOf to get index of this common string in between both strings and fetch uncommon values from both.

example :

CICROSOFK
WOCROSFGT

Common letter is

CROS

Find different string from 0 to index of SOFT and from index+'SOFT'.length to str.length

like image 159
rohitanand Avatar answered Nov 15 '22 04:11

rohitanand


I already flagged a duplicate question above whose answer uses Longest Common Subsequence for 2 Strings.

So you can apply it recursively and on each new recursion, use a placeholder where this LCS was found so that you can mark the parts that differ. In the end, when no more common sequences exist, you will have to split each String by the placeholder and get the required parts.

UPDATE 1: If I think better now, this recursion part might not lead to an optimal solution (from the total execution time point of view), since you will iterate over the Strings multiple times. But there might be a way to retrieve all sequences from one iteration by reusing (a reduced version of) the memoization table, check this implementation and this more detailed one.

UPDATE 2: I have managed to implement the recursive version (not optimal), based on this code:

public class LongestCommonSequence {

    private final char[] firstStr;
    private final char[] secondStr;
    private int[][] LCS;
    private String[][] solution;
    private int max = -1, maxI = -1, maxJ = -1;
    private static final Character SEPARATOR = '|';

    public LongestCommonSequence(char[] firstStr, char[] secondStr) {
        this.firstStr = firstStr;
        this.secondStr = secondStr;
        LCS = new int[firstStr.length + 1][secondStr.length + 1];
        solution = new String[firstStr.length + 1][secondStr.length + 1];
    }

    public String find() {

        for (int i = 0; i <= secondStr.length; i++) {
            LCS[0][i] = 0;
            if(i > 0) {
                solution[0][i] = "   " + secondStr[i - 1];
            }
        }

        for (int i = 0; i <= firstStr.length; i++) {
            LCS[i][0] = 0;
            if(i > 0) {
                solution[i][0] = "   " + firstStr[i - 1];
            }
        }

        solution[0][0] = "NONE";

        for (int i = 1; i <= firstStr.length; i++) {
            for (int j = 1; j <= secondStr.length; j++) {
                if (firstStr[i - 1] == secondStr[j - 1] && firstStr[i - 1] != SEPARATOR) {
                    LCS[i][j] = LCS[i - 1][j - 1] + 1;
                    solution[i][j] = "diag";
                } else {
                    LCS[i][j] = 0;
                    solution[i][j] = "none";
                }
                if(LCS[i][j] > max) {
                    max = LCS[i][j];
                    maxI = i;
                    maxJ = j;
                }
            }
        }

        System.out.println("Path values:");
        for (int i = 0; i <= firstStr.length; i++) {
            for (int j = 0; j <= secondStr.length; j++) {
                System.out.print(" " + LCS[i][j]);
            }
            System.out.println();
        }

        System.out.println();
        System.out.println("Path recovery:");
        for (int i = 0; i <= firstStr.length; i++) {
            for (int j = 0; j <= secondStr.length; j++) {
                System.out.print(" " + solution[i][j]);
            }
            System.out.println();
        }
        System.out.println();
        System.out.println("max:" + max + " maxI:" + maxI + " maxJ:" + maxJ);

        return printSolution(maxI, maxJ);
    }

    public String printSolution(int i, int j) {
        String answer = "";
        while(i - 1 >= 0 && j - 1 >= 0 && LCS[i][j] != 0) {
            answer = firstStr[i - 1] + answer;
            i--;
            j--;
        }
        System.out.println("Current max solution: " + answer);
        return answer;
    }

    public static void main(String[] args) {
        String firstStr = "x4.printString(\\\"Bianca.()\\\").y1();";
        String secondStr = "sb.printString(\\\"Bianca.()\\\").length();";
        String maxSubstr;
        LongestCommonSequence lcs;
        do {
            lcs = new LongestCommonSequence(firstStr.toCharArray(), secondStr.toCharArray());
            maxSubstr = lcs.find();
            if(maxSubstr.length() != 0) {
                firstStr = firstStr.replace(maxSubstr, "" + LongestCommonSequence.SEPARATOR);
                secondStr = secondStr.replace(maxSubstr, "" + LongestCommonSequence.SEPARATOR);
            }
        }
        while(maxSubstr.length() != 0);

        System.out.println();
        System.out.println("first:" + firstStr + " second: " + secondStr);

        System.out.println("First array: ");
        String[] firstArray = firstStr.split("\\" + SEPARATOR);
        String[] secondArray = secondStr.split("\\" + SEPARATOR);
        for(String s: firstArray) {
            System.out.println(s);
        }
        System.out.println();
        System.out.println("Second array: ");
        for(String s: secondArray) {
            System.out.println(s);
        }
    }
}
like image 39
aUserHimself Avatar answered Nov 15 '22 05:11

aUserHimself