Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate matching score between two string in java?

I want classify two strings as similar or not similar. For example

s1 = "Token is invalid. DeviceId = deviceId: "345" "
s2 = "Token is invalid. DeviceId = deviceId: "123" "
s3 = "Could not send Message."

I am looking for a java library that can give a matching score between 2 strings and from that score I can determine if they are similar of not. My program only needs to work on a small data set (~2000 Strings). Do you know if there is something already available out there?

like image 546
Sean Nguyen Avatar asked Jul 26 '13 19:07

Sean Nguyen


3 Answers

Check Levenshtein distance for matching score

http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Java

like image 78
Ezequiel Avatar answered Sep 21 '22 04:09

Ezequiel


As suggested. Levenshtein distance algorithm...

public class LevenshteinDistance
{
    private static int minimum(int a, int b, int c)
    {
        return Math.min(Math.min(a, b), c);
    }

    public static int computeLevenshteinDistance(CharSequence str1, CharSequence str2)
    {
        int[][] distance = new int[str1.length() + 1][str2.length() + 1];

        for (int i = 0; i <= str1.length(); i++)
            distance[i][0] = i;
        for (int j = 1; j <= str2.length(); j++)
            distance[0][j] = j;

        for (int i = 1; i <= str1.length(); i++)
            for (int j = 1; j <= str2.length(); j++)
                distance[i][j] = minimum(distance[i - 1][j] + 1, 
                                         distance[i][j - 1] + 1, 
                                         distance[i - 1][j - 1] + ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0 : 1));

        return distance[str1.length()][str2.length()];
    }

    public static void main(String[] args)
    {
        String s1 = "Token is invalid. DeviceId = deviceId: \"345\" ";
        String s2 = "Token is invalid. DeviceId = deviceId: \"123\" ";
        String s3 = "Could not send Message.";

        System.out.println(computeLevenshteinDistance(s1, s2)); // s1 VS. s2
        System.out.println(computeLevenshteinDistance(s1, s3)); // s1 VS. s3
        System.out.println(computeLevenshteinDistance(s2, s3)); // s2 Vs. s3

    }
}
like image 20
JBuenoJr Avatar answered Sep 21 '22 04:09

JBuenoJr


For all NLP java problems, you should check the Apache Lucene project. However, for your need a simple Levenshtein distance algo is enought

like image 43
Sebastien Lorber Avatar answered Sep 18 '22 04:09

Sebastien Lorber