Hello fellow programmers,
I would like to ask for some help with regards to near matches of strings.
Currently, I have a program that stores strings of description, users can search for description by typing it completely or partially.
I would like to implement a near match search. For example the actual description is "hello world" but user erroneously enter a search "hello eorld". The programs should be able to return "hello world" to user.
I've tried looking at pattern and matches to implement it, but it requires a regex to match strings, whereby my description does not have a regular pattern. I've also tried string.contains, but it doesn't seems to work either. Below is part of the code i tried to implement.
ArrayList <String> list = new ArrayList<String>();
list.add("hello world");
list.add("go jogging at london");
list.add("go fly kite");
Scanner scan = new Scanner(System.in);
for(int i = 0; i < list.size(); i++){
if(list.get(i).contains(scan.next())) {
System.out.println(list.get(i));
}
}
Could fellow programmers help me with this??
Using String. equals() :In Java, string equals() method compares the two given strings based on the data/content of the string. If all the contents of both the strings are same then it returns true. If any character does not match, then it returns false.
If you need to know if a string matches a regular expression RegExp , use RegExp.prototype.test() . If you only want the first match found, you might want to use RegExp.prototype.exec() instead.
Java - String matches() Method This method tells whether or not this string matches the given regular expression. An invocation of this method of the form str. matches(regex) yields exactly the same result as the expression Pattern. matches(regex, str).
The Levenshtein distance is able to qualify the difference between two strings
Here is an implementation taken form here:
public class LevenshteinDistance {
private static int minimum(int a, int b, int c) {
return Math.min(Math.min(a, b), c);
}
public static int computeLevenshteinDistance(
CharSequence str1,
CharSequence str2 )
{
int[][] distance = new int[str1.length() + 1][str2.length() + 1];
for (int i = 0; i <= str1.length(); i++)
distance[i][0] = i;
for (int j = 1; j <= str2.length(); j++)
distance[0][j] = j;
for (int i = 1; i <= str1.length(); i++)
for (int j = 1; j <= str2.length(); j++)
distance[i][j] =
minimum(
distance[i - 1][j] + 1,
distance[i][j - 1] + 1,
distance[i - 1][j - 1] +
((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0 : 1));
return distance[str1.length()][str2.length()];
}
}
You can use LCS(Longest Common Subsequence) see these: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
public class LCS {
public static void main(String[] args) {
String x = StdIn.readString();
String y = StdIn.readString();
int M = x.length();
int N = y.length();
// opt[i][j] = length of LCS of x[i..M] and y[j..N]
int[][] opt = new int[M+1][N+1];
// compute length of LCS and all subproblems via dynamic programming
for (int i = M-1; i >= 0; i--) {
for (int j = N-1; j >= 0; j--) {
if (x.charAt(i) == y.charAt(j))
opt[i][j] = opt[i+1][j+1] + 1;
else
opt[i][j] = Math.max(opt[i+1][j], opt[i][j+1]);
}
}
// recover LCS itself and print it to standard output
int i = 0, j = 0;
while(i < M && j < N) {
if (x.charAt(i) == y.charAt(j)) {
System.out.print(x.charAt(i));
i++;
j++;
}
else if (opt[i+1][j] >= opt[i][j+1]) i++;
else j++;
}
System.out.println();
}
}
Other solution is Aho–Corasick string matching algorithm see this : Fast algorithm for searching for substrings in a string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With