Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding all of the matching substrings, not only the "most extended" one

The code

String s = "y z a a a b c c z";
Pattern p = Pattern.compile("(a )+(b )+(c *)c");
Matcher m = p.matcher(s);
while (m.find()) {
    System.out.println(m.group());
}

prints

a a a b c c

which is right.

But logically, the substrings

a a a b c
a a b c c
a a b c
a b c c
a b c

match the regex too.

So, how can I make the code find those substrings too, i.e. not only the most extended one, but also its children?

like image 295
sp00m Avatar asked Jun 27 '12 14:06

sp00m


2 Answers

You can use the reluctant qualifiers such as *? and +?. These match as little as possible, in contrast to the standard * and + which are greedy, i.e. match as much as possible. Still, this only allows you to find particular "sub-matches", not all of them. Some more control can be achieved using lookahead controlling non-capturing groups, also described in the docs. But in order to really find all sub-matches, you would probably have to do stuff yourself, i.e. build the automaton to which the regex corresponds and navigate it using custom code.

like image 181
Michał Kosmulski Avatar answered Nov 10 '22 02:11

Michał Kosmulski


You will need a lazy quantifier.

Please try the following:

Pattern p = Pattern.compile("(a )+(b )+((c )*?)c");

Please also notice, that I grouped "c" once again, since I think that's what you want. Otherwise you would find arbitrarily many spaces, but not "c".

like image 31
Peter Wippermann Avatar answered Nov 10 '22 03:11

Peter Wippermann