Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing strings with tolerance

I'm looking for a way to compare a string with an array of strings. Doing an exact search is quite easy of course, but I want my program to tolerate spelling mistakes, missing parts of the string and so on.

Is there some kind of framework which can perform such a search? I'm having something in mind that the search algorithm will return a few results order by the percentage of match or something like this.

like image 221
Oliver Hanappi Avatar asked Feb 26 '10 19:02

Oliver Hanappi


People also ask

What is the best way to compare strings?

The right way of comparing String in Java is to either use equals(), equalsIgnoreCase(), or compareTo() method. You should use equals() method to check if two String contains exactly same characters in same order. It returns true if two String are equal or false if unequal.

What are the 3 ways to compare two string objects?

There are three ways to compare String in Java: By Using equals() Method. By Using == Operator. By compareTo() Method.

How do you compare two string values?

Using String. equals() :In Java, string equals() method compares the two given strings based on the data/content of the string. If all the contents of both the strings are same then it returns true. If any character does not match, then it returns false.

Can you use == to compare two strings?

== operator is avoided, since it checks for reference equality, i.e. if the strings point to the same object or not. The methods mentioned in the article provide a meticulous way to compare two strings in the programming language of java.


1 Answers

You could use the Levenshtein Distance algorithm.

"The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character." - Wikipedia.com

This one is from dotnetperls.com:

using System;  /// <summary> /// Contains approximate string matching /// </summary> static class LevenshteinDistance {     /// <summary>     /// Compute the distance between two strings.     /// </summary>     public static int Compute(string s, string t)     {         int n = s.Length;         int m = t.Length;         int[,] d = new int[n + 1, m + 1];          // Step 1         if (n == 0)         {             return m;         }          if (m == 0)         {             return n;         }          // Step 2         for (int i = 0; i <= n; d[i, 0] = i++)         {         }          for (int j = 0; j <= m; d[0, j] = j++)         {         }          // Step 3         for (int i = 1; i <= n; i++)         {             //Step 4             for (int j = 1; j <= m; j++)             {                 // Step 5                 int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;                  // Step 6                 d[i, j] = Math.Min(                     Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),                     d[i - 1, j - 1] + cost);             }         }         // Step 7         return d[n, m];     } }  class Program {     static void Main()     {         Console.WriteLine(LevenshteinDistance.Compute("aunt", "ant"));         Console.WriteLine(LevenshteinDistance.Compute("Sam", "Samantha"));         Console.WriteLine(LevenshteinDistance.Compute("flomax", "volmax"));     } } 

You may in fact prefer to use the Damerau-Levenshtein distance algorithm, which also allows characters to be transposed, which is a common human error in data entry. You'll find a C# implementation of it here.

like image 186
D'Arcy Rittich Avatar answered Oct 31 '22 07:10

D'Arcy Rittich