I implemented the Levenshtein algorithm in Java and am now getting the corrections made by the algorithm, a.k.a. the cost. This does help a little but not much since I want the results as a percentage.
So I want to know how to calculate those similarity points.
I would also like to know how you people do it and why.
The Levenshtein distance is usually calculated by preparing a matrix of size (M+1)x(N+1) —where M and N are the lengths of the 2 words—and looping through said matrix using 2 for loops, performing some calculations within each iteration.
The way to check the similarity between any data point or groups is by calculating the distance between those data points. In textual data as well, we check the similarity between the strings by calculating the distance between one text to another text.
Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.
The Levenshtein distance is a string metric for measuring difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.
The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. (Wikipedia)
So if you need a percentage, you have to use this to points to scale. For example:
"Hallo", "Hello" -> Levenstein distance 1 Max Levenstein distance for this two strings is: 5. So the 20% of the characters do not match.
String s1 = "Hallo";
String s2 = "Hello";
int lfd = calculateLevensteinDistance(s1, s2);
double ratio = ((double) lfd) / (Math.max(s1.length, s2.length));
You can download Apache Commons StringUtils and investigate (and maybe use) their implementation of Levenshtein distance algorithm.
// Refer This: 100% working
public class demo
{
public static void main(String[] args)
{
String str1, str2;
str1="12345";
str2="122345";
int re=pecentageOfTextMatch(str1, str2);
System.out.println("Matching Percent"+re);
}
public static int pecentageOfTextMatch(String s0, String s1)
{ // Trim and remove duplicate spaces
int percentage = 0;
s0 = s0.trim().replaceAll("\\s+", " ");
s1 = s1.trim().replaceAll("\\s+", " ");
percentage=(int) (100 - (float) LevenshteinDistance(s0, s1) * 100 / (float) (s0.length() + s1.length()));
return percentage;
}
public static int LevenshteinDistance(String s0, String s1) {
int len0 = s0.length() + 1;
int len1 = s1.length() + 1;
// the array of distances
int[] cost = new int[len0];
int[] newcost = new int[len0];
// initial cost of skipping prefix in String s0
for (int i = 0; i < len0; i++)
cost[i] = i;
// dynamically computing the array of distances
// transformation cost for each letter in s1
for (int j = 1; j < len1; j++) {
// initial cost of skipping prefix in String s1
newcost[0] = j - 1;
// transformation cost for each letter in s0
for (int i = 1; i < len0; i++) {
// matching current letters in both strings
int match = (s0.charAt(i - 1) == s1.charAt(j - 1)) ? 0 : 1;
// computing cost for each transformation
int cost_replace = cost[i - 1] + match;
int cost_insert = cost[i] + 1;
int cost_delete = newcost[i - 1] + 1;
// keep minimum cost
newcost[i] = Math.min(Math.min(cost_insert, cost_delete),
cost_replace);
}
// swap cost/newcost arrays
int[] swap = cost;
cost = newcost;
newcost = swap;
}
// the distance is the cost for transforming all letters in both strings
return cost[len0 - 1];
}
}
LevenshteinDistance
It can be used through maven dependency
I do think it is better to use this implementation than write your own one.
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.3</version>
</dependency>
As an example, have a look at code below
import org.apache.commons.text.similarity.LevenshteinDistance;
public class MetricUtils {
private static LevenshteinDistance lv = new LevenshteinDistance();
public static void main(String[] args) {
String s = "running";
String s1 = "runninh";
System.out.println(levensteinRatio(s, s1));
}
public static double levensteinRatio(String s, String s1) {
return 1 - ((double) lv.apply(s, s1)) / Math.max(s.length(), s1.length());
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With