Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use regex to match a^xb^x, where x is the number of times a,b appear

Tags:

java

To clarify, I want to know how to use a regex to match:

ab
aabb
aaabbb 
...

I just found out that this works in Perl:

if ($exp =~ /^(a(?1)?b)$/)

To understand this, look at the string as though it grows from the outside-in, not left-right:

ab
a(ab)b
aa(ab)bb

(?1) is a reference to the outer set of parentheses. We need the ? afterwards for the last case (going from outside in), nothing is left and ? means 0 or 1 of the preceding expression (so it essentially acts as our base case).

My questions is: What is the equivalent of (?1) in Java?

like image 666
Steve P. Avatar asked Nov 13 '22 03:11

Steve P.


2 Answers

In general, regular expressions are limited to regular languages, which means that since they are equivalent to languages acceptable by DFAs (discrete finite automata), they can't count, as generalized counting requires an infinite number of states to represent the infinite number of possible counted values.

Since discrete != infinte, you can't really count, but you can do some limited types of matching like in your (a (something) b) example.

Discusses a few limitations of DFAs (and Regular Languages / Regular Expressions by extension) http://www.cs.washington.edu/education/courses/cse599/99sp/admin/Slides/Week2/sld012.htm

A better, but more verbose slide that expands on the limitations by going into DFAs in (still a bit high level) detail http://www.cs.princeton.edu/courses/archive/spr05/cos126/lectures/18.pdf

By the way, the inside-out expansion is a cool DFA trick to basically side-step the actual need to count by using pattern matching that happens to rebuild the mirror image of the string. It looks like counting, but will fall apart as soon as you do more interesting things, like require matching in order (as opposed to mirrored order matching).

like image 182
Edwin Buck Avatar answered Nov 14 '22 21:11

Edwin Buck


You can use the Pattern and Matcher classes to count the number of occurrences of one of your two characters, then enforce this number in your regular expression using the {n} syntax.

import java.util.regex.*;

class Test {
    public static void main(String[] args) {
        String  s       = "aaaabbbb";
        Pattern pattern = Pattern.compile("a");
        Matcher matcher = pattern.matcher(s);

        // count all 'a's
        int count = 0;
        while (matcher.find())
            count++;

        // now enforce count matches for a and b in your regular expression
        String rExp = String.format("a{%d}b{%d}", count, count);
        Pattern matchSameCount = Pattern.compile(rExp);

        Matcher m2 = matchSameCount.matcher(s);
        System.out.println( m2.matches()); 
        // prints true
    }
}

Overall it is a little more work, but it's the only way I can think of that actually works at the moment.

like image 29
Hunter McMillen Avatar answered Nov 14 '22 23:11

Hunter McMillen