Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java recursive(?) repeated(?) deep(?) pattern matching

I'm trying to get ALL the substrings in the input string that match the given pattern.

For example,

Given string: aaxxbbaxb
Pattern: a[a-z]{0,3}b
(What I actually want to express is: all the patterns that starts with a and ends with b, but can have up to 2 alphabets in between them)

Exact results that I want (with their indexes):

aaxxb: index 0~4
axxb: index 1~4
axxbb: index 1~5
axb: index 6~8

But when I run it through the Pattern and Matcher classes using Pattern.compile() and Matcher.find(), it only gives me:

aaxxb : index 0~4
axb : index 6~8

This is the piece of code I used.

Pattern pattern = Pattern.compile("a[a-z]{0,3}b", Pattern.CASE_INSENSITIVE);
Matcher match = pattern.matcher("aaxxbbaxb");
while (match.find()) {
    System.out.println(match.group());
}

How can I retrieve every single piece of string that matches the pattern?

Of course, it doesn't have to use Pattern and Matcher classes, as long as it's efficient :)

like image 986
cnc4ever Avatar asked Oct 11 '22 02:10

cnc4ever


1 Answers

(see: All overlapping substrings matching a java regex )

Here is the full solution that I came up with. It can handle zero-width patterns, boundaries, etc. in the original regular expression. It looks through all substrings of the text string and checks whether the regular expression matches only at the specific position by padding the pattern with the appropriate number of wildcards at the beginning and end. It seems to work for the cases I tried -- although I haven't done extensive testing. It is most certainly less efficient than it could be.

  public static void allMatches(String text, String regex)
  {
    for (int i = 0; i < text.length(); ++i) {
      for (int j = i + 1; j <= text.length(); ++j) {
        String positionSpecificPattern = "((?<=^.{"+i+"})("+regex+")(?=.{"+(text.length() - j)+"}$))";
        Matcher m = Pattern.compile(positionSpecificPattern).matcher(text);

        if (m.find()) 
        {   
          System.out.println("Match found: \"" + (m.group()) + "\" at position [" + i + ", " + j + ")");
        }   
      }   
    }   
  }
like image 113
dsg Avatar answered Oct 14 '22 03:10

dsg