Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LevensteinDistance - Commons Lang 3.0 API

With Commons Lang api I can calculate the similarity between two strings through the LevensteinDistance. The result is the number of changes needed to change one string into another. I wish the result was within the range from 0 to 1, where it would be easier to identify the similarity between the strings. The result would be closer to 0 great similarity. Is it possible?

Below the example I'm using:

public class TesteLevenstein {

    public static void main(String[] args) {      

        int distance1 = StringUtils.getLevenshteinDistance("Boat", "Coat");
        int distance2 = StringUtils.getLevenshteinDistance("Remember", "Alamo");
        int distance3 = StringUtils.getLevenshteinDistance("Steve", "Stereo");

        System.out.println("distance(Boat, Coat): " + distance1);
        System.out.println("distance(Remember, Alamo): " + distance2);
        System.out.println("distance(Steve, Stereo): " + distance3);        

    }
}

Thanks!

like image 940
Deb Avatar asked Jul 08 '11 19:07

Deb


People also ask

What does StringUtils do?

StringUtils handles null input Strings quietly. That is to say that a null input will return null . Where a boolean or int is being returned details vary by method. A side effect of the null handling is that a NullPointerException should be considered a bug in StringUtils .

Is StringUtils contains null safe?

String class offers a limited set of String methods so this is where StringUtils comes in. StringUtils provides null-safe methods for handling Strings and is probably the most commonly used class in the Apache Commons project.

What is levenshtein distance used for?

The Levenshtein distance is a string metric for measuring difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.

What is StringUtils trimToNull?

trimToNull() is a static method of the StringUtils class that is used to remove the control characters from both ends of the input string. The method returns null if the input string is null . The method returns null if the input string results in an empty string after the trim operation.


1 Answers

Just divide by some number. The question is what number? Probably the maximum possible distance for the given pair of strings. I think that's the length of the longer string (ie all the characters are different, plus a few more were added, compared with the shorter string).

like image 181
MRAB Avatar answered Oct 05 '22 01:10

MRAB