I'm looking for a way to compare a string with an array of strings. Doing an exact search is quite easy of course, but I want my program to tolerate spelling mistakes, missing parts of the string and so on.
Is there some kind of framework which can perform such a search? I'm having something in mind that the search algorithm will return a few results order by the percentage of match or something like this.
The right way of comparing String in Java is to either use equals(), equalsIgnoreCase(), or compareTo() method. You should use equals() method to check if two String contains exactly same characters in same order. It returns true if two String are equal or false if unequal.
There are three ways to compare String in Java: By Using equals() Method. By Using == Operator. By compareTo() Method.
Using String. equals() :In Java, string equals() method compares the two given strings based on the data/content of the string. If all the contents of both the strings are same then it returns true. If any character does not match, then it returns false.
== operator is avoided, since it checks for reference equality, i.e. if the strings point to the same object or not. The methods mentioned in the article provide a meticulous way to compare two strings in the programming language of java.
You could use the Levenshtein Distance algorithm.
"The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character." - Wikipedia.com
This one is from dotnetperls.com:
using System; /// <summary> /// Contains approximate string matching /// </summary> static class LevenshteinDistance { /// <summary> /// Compute the distance between two strings. /// </summary> public static int Compute(string s, string t) { int n = s.Length; int m = t.Length; int[,] d = new int[n + 1, m + 1]; // Step 1 if (n == 0) { return m; } if (m == 0) { return n; } // Step 2 for (int i = 0; i <= n; d[i, 0] = i++) { } for (int j = 0; j <= m; d[0, j] = j++) { } // Step 3 for (int i = 1; i <= n; i++) { //Step 4 for (int j = 1; j <= m; j++) { // Step 5 int cost = (t[j - 1] == s[i - 1]) ? 0 : 1; // Step 6 d[i, j] = Math.Min( Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost); } } // Step 7 return d[n, m]; } } class Program { static void Main() { Console.WriteLine(LevenshteinDistance.Compute("aunt", "ant")); Console.WriteLine(LevenshteinDistance.Compute("Sam", "Samantha")); Console.WriteLine(LevenshteinDistance.Compute("flomax", "volmax")); } }
You may in fact prefer to use the Damerau-Levenshtein distance algorithm, which also allows characters to be transposed, which is a common human error in data entry. You'll find a C# implementation of it here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With