Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String pattern matching [closed]

I can't figure out how to solve this:

given two strings, one representing a pattern, one a random string, determine whether it pattern matches with the first string

ex:

string1: "aaba"
string2: "catcatdogcat"

thus, string1 and string2 are pattern matched

versus if string2 were "catcatcatcat" this would not be pattern matched.

Do this for any pattern and string.

I know it's recursion but I'm pretty stuck on... how to solve this

like image 641
user2457563 Avatar asked Oct 22 '13 18:10

user2457563


People also ask

What does string pattern matching mean?

Pattern matching and strings. By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters.

What is string pattern matching in Python?

Regular expressions, called regexes for short, are descriptions for a pattern of text. For example, a \d in a regex stands for a digit character — that is, any single numeral 0 to 9. Following regex is used in Python to match a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers.

What is string matching used for?

String matching or searching algorithms try to find places where one or several strings (also called patterns) are found within a larger string (searched text):

What is string matching in Ada?

(classic problem) Definition: The problem of finding occurrence(s) of a pattern string within another string or body of text. There are many different algorithms for efficient searching. Also known as exact string matching, string searching, text searching.


3 Answers

Ok, I'm gonna try to explain a recursion for this,sounds right but I don't have a chance to test it ( not at home ).

Take a vector v['size of alphabet'], where v[i] = how many letters from string2 = letter i from string 1.

In you case in the end it is : v['a'] = 3, v[b] =3;

You initialise the vector with 1.

For the rec function:

You take the first letter from string1 : a; Represent for a from string2 is the string that starts at string2 and ends at string2+v['a']; which is 'c'; You check if this is a valid solution untill now, and it is.

Then you go into rec( string1 + 1 ) , letter a again, since v['a'] still = 1 then you take the second a as = 'a'. You check if this is a valid soulution, and it is not because you have already defined the first a as 'c'. You go back in the recursion and increment v['a'], start from the begging.

You take the first letter of string1 : a; Represent from string2 which is 'ca' , ( now v['a'] = 2 ) check if valid. rec ( string1 +1 );

and so on... at a point you will reach v['a'] = 3 and v['b'] = 3; then with the rec function you will find the solution.

I for one find it easier to implement in a interative function but you said something about recursion so yeah.

like image 80
taigi tanaka Avatar answered Nov 18 '22 11:11

taigi tanaka


Take the number of unique letters. Then you want to iterate through all combinations of possible lengths for each letter using the following constraints:

  1. sum(length of letter * occurances of letter) has to be the length of string2
  2. Each length must be at least 1

That is, for 2 unique letters, and a string length of 4, the possible lengths are:

(1, 3) and (2, 2)

From here it's simple. For each unique letter you can find out the string that letter must represent for the given string, as you know the length of each letter. Then it's a matter mapping each letter to the string it must represent, and if at any time a letter corresponds to a string that didn't match an earlier instance of it, then you have no match.

For your example:

string1: "aaba"
string2: "catcatdogcat"

Here, for the iteration where the lengths are (3, 3). Since we know a is of length 3, we know the first iteration of a must be "cat". Then the next a, corresponds to "cat" (still have a match). Then the next 3 have to correspond to b. This is the first b so it can match any 3 chars. Then match a at the end to cat again, and you're done.

If you want a,b,c to be unique as outlined in @MartijnCourteaux comment(and in your question now that I read again), then at the end you can just check your map for common values, if there are any common values then you have no match.

If you have a match at ANY iteration, then the string matches the pattern. There is only no match, if there is no match at ALL iterations.

like image 44
Cruncher Avatar answered Nov 18 '22 12:11

Cruncher


This is quite easy to achieve:

Regex is the way to go. In Regex, there is something called a backreference. Backreferences need to match the very same string, the mentioned match group has already matched. i.e. the Regex ^([ab])\\1$ will match every String like aa or bb. The First group matches either a or b - but the backreference NEEDS to match the same thing, the matchgroup (in this case "1") matched.

So, all you need to do is: Convert your String-based pattern to a Regex pattern.

Example:

String regex = "^([a-z]+)\\1([a-z]+)\\1$";
   Pattern p = Pattern.compile(regex);
   Matcher m = p.matcher("catcatdogcat");

   if (m.matches()){
     System.out.println("matches!");
     System.out.println(m.group(0));
     System.out.println(m.group(1));
     System.out.println(m.group(2));

   }else{
    System.out.println("no matches!");
   }

produces:

matches!
catcatdogcat
cat
dog

this will EXACTLY match your given string "catcatdogcat", while match Group 1 beeing "cat" and match group 2 beeing "dog".

What you now need to do is:

  • Write a function, that checks your string pattern aaba char by char.
  • First occurence of a letter: replace it with ([a-z]+) and note the number of that matchgroup (Array, Hashmap, ...)
  • Any further occurence of the letter: replace it with \\1 (if the recorded number for the letter was 1)
  • Wrap the result with ^ and $.

Finally, your String aaba will be converted to ^([a-z]+)\\1([a-z]+)\\1$ and serve your needs. The pattern abccba will become the regex ^([a-z]+)([a-z]+)([a-z]+)\\3\\2\\1$

Finally use the matcher to check your given string.

This example assumes only lowercase characters, but you can extend it.

however it is imporant to keep the "+", cause the "*" would allow zero length-matches, which will make your regex match about ALL THE TIME.

Second example mentioned:

import java.util.regex.*;

public class HelloWorld {
  public static void main(String[] args) {
   String regex = "^([a-z]+)([a-z]+)([a-z]+)\\3\\2\\1$";
   Pattern p = Pattern.compile(regex);
   Matcher m = p.matcher("catdogcowcowdogcat");

   if (m.matches()){
     System.out.println("matches!");
     System.out.println(m.group(0));
     System.out.println(m.group(1));
     System.out.println(m.group(2));
     System.out.println(m.group(3));

   }else{
    System.out.println("no matches!");
   }
  }
}

produces:

matches!
catdogcowcowdogcat
cat
dog
cow

edit: if needed (even if it does not 100% match your requirements - see comments):

public static String convertToRegex(String pattern){
    String regex = "";
    Map<Character, Integer> refs = new HashMap<Character, Integer>();
    Integer i=1;
    for (Character c : pattern.toCharArray()){
      if (refs.containsKey(c)){
         //known.
         regex += "\\" + refs.get(c);
      }else{
         //unknown
         regex += "([a-z]+)";
         refs.put(c, i++);
      }
    }

    return "^" + regex + "$";
  }
like image 1
dognose Avatar answered Nov 18 '22 11:11

dognose