Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression with variable number of groups?

Tags:

java

regex

People also ask

How do you denote groups in regular expressions?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

What is difference [] and () in regex?

This answer is not useful. Show activity on this post. [] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

How do you find the number of matches in a regular expression?

To count the number of regex matches, call the match() method on the string, passing it the regular expression as a parameter, e.g. (str. match(/[a-z]/g) || []). length . The match method returns an array of the regex matches or null if there are no matches found.

Which operator is required to group in regex?

The Concatenation Operator The result is a regular expression that will match a string if a matches its first part and b matches the rest.


According to the documentation, Java regular expressions can't do this:

The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.

(emphasis added)


You can use split to get the fields you need into an array and loop through that.

http://download.oracle.com/javase/1,5.0/docs/api/java/lang/String.html#split(java.lang.String)


I have not used java regex, but for many languages the answer is: No.

Capturing groups seem to be created when the regex is parsed, and filled when it matches the string. The expression (a)|(b)(c) has three capturing groups, only if either one, or two of them can be filled. (a)* has just one group, the parser leaves the last match in the group after matching.


Pattern p = Pattern.compile("ab(?:(c)|(d))*ef");
Matcher m = p.matcher("abcdef");
m.matches();

should do what you want.

EDIT:

@aioobe, I understand now. You want to be able to do something like the grammar

A    ::== <Foo> <Bars> <Baz>
Foo  ::== "foo"
Baz  ::== "baz"
Bars ::== <Bar> <Bars>
        | ε
Bar  ::== "A"
        | "B"

and pull out all the individual matches of Bar.

No, there is no way to do that using java.util.regex. You can recurse and use a regex on the match of Bars or use a parser generator like ANTLR and attach a side-effect to Bar.