Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to implement near matches of strings in java?

Hello fellow programmers,

I would like to ask for some help with regards to near matches of strings.

Currently, I have a program that stores strings of description, users can search for description by typing it completely or partially.

I would like to implement a near match search. For example the actual description is "hello world" but user erroneously enter a search "hello eorld". The programs should be able to return "hello world" to user.

I've tried looking at pattern and matches to implement it, but it requires a regex to match strings, whereby my description does not have a regular pattern. I've also tried string.contains, but it doesn't seems to work either. Below is part of the code i tried to implement.

    ArrayList <String> list = new ArrayList<String>();
    list.add("hello world");
    list.add("go jogging at london");
    list.add("go fly kite");
    Scanner scan = new Scanner(System.in);

    for(int i = 0; i < list.size(); i++){
      if(list.get(i).contains(scan.next())) {
         System.out.println(list.get(i));
      }
    }

Could fellow programmers help me with this??

like image 753
melyong Avatar asked Nov 02 '12 14:11

melyong


People also ask

How do you match a string in Java?

Using String. equals() :In Java, string equals() method compares the two given strings based on the data/content of the string. If all the contents of both the strings are same then it returns true. If any character does not match, then it returns false.

How do you check if a string matches?

If you need to know if a string matches a regular expression RegExp , use RegExp.prototype.test() . If you only want the first match found, you might want to use RegExp.prototype.exec() instead.

How do you check if a string matches a pattern in Java?

Java - String matches() Method This method tells whether or not this string matches the given regular expression. An invocation of this method of the form str. matches(regex) yields exactly the same result as the expression Pattern. matches(regex, str).


2 Answers

The Levenshtein distance is able to qualify the difference between two strings

Here is an implementation taken form here:

public class LevenshteinDistance {
   private static int minimum(int a, int b, int c) {
      return Math.min(Math.min(a, b), c);
   }

   public static int computeLevenshteinDistance(
      CharSequence str1,
      CharSequence str2 )
   {
      int[][] distance = new int[str1.length() + 1][str2.length() + 1];

      for (int i = 0; i <= str1.length(); i++)
         distance[i][0] = i;
      for (int j = 1; j <= str2.length(); j++)
         distance[0][j] = j;

      for (int i = 1; i <= str1.length(); i++)
         for (int j = 1; j <= str2.length(); j++)
            distance[i][j] =
               minimum(
                  distance[i - 1][j] + 1,
                  distance[i][j - 1] + 1,
                  distance[i - 1][j - 1] +
                     ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0 : 1));

      return distance[str1.length()][str2.length()];
   }
}
like image 116
Aubin Avatar answered Oct 23 '22 06:10

Aubin


You can use LCS(Longest Common Subsequence) see these: http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

public class LCS {

    public static void main(String[] args) {
        String x = StdIn.readString();
        String y = StdIn.readString();
        int M = x.length();
        int N = y.length();

        // opt[i][j] = length of LCS of x[i..M] and y[j..N]
        int[][] opt = new int[M+1][N+1];

        // compute length of LCS and all subproblems via dynamic programming
        for (int i = M-1; i >= 0; i--) {
            for (int j = N-1; j >= 0; j--) {
                if (x.charAt(i) == y.charAt(j))
                    opt[i][j] = opt[i+1][j+1] + 1;
                else 
                    opt[i][j] = Math.max(opt[i+1][j], opt[i][j+1]);
            }
        }

        // recover LCS itself and print it to standard output
        int i = 0, j = 0;
        while(i < M && j < N) {
            if (x.charAt(i) == y.charAt(j)) {
                System.out.print(x.charAt(i));
                i++;
                j++;
            }
            else if (opt[i+1][j] >= opt[i][j+1]) i++;
            else                                 j++;
        }
        System.out.println();

    }

}

Other solution is Aho–Corasick string matching algorithm see this : Fast algorithm for searching for substrings in a string

like image 40
Sajad Bahmani Avatar answered Oct 23 '22 07:10

Sajad Bahmani