Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex: Repeating capturing groups

Tags:

java

regex

An item is a comma delimited list of one or more strings of numbers or characters e.g.

"12"
"abc"
"12,abc,3"

I'm trying to match a bracketed list of zero or more items in Java e.g.

""
"(12)"
"(abc,12)"
"(abc,12),(30,asdf)"
"(qqq,pp),(abc,12),(30,asdf,2),"

which should return the following matching groups respectively for the last example

qqq,pp
abc,12
30,asdf,2

I've come up with the following (incorrect)pattern

\((.+?)\)(?:,\((.+?)\))*

which matches only the following for the last example

qqq,pp
30,asdf,2

Tips? Thanks

like image 224
Justin Wong Avatar asked Aug 04 '11 10:08

Justin Wong


People also ask

How do Capturing groups work in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

Does special group and group 0 is included while capturing groups using groupCount?

There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.

What is regex grouping?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.


2 Answers

That's right. You can't have a "variable" number of capturing groups in a Java regular expression. Your Pattern has two groups:

\((.+?)\)(?:,\((.+?)\))*
  |___|        |___|
 group 1      group 2

Each group will contain the content of the last match for that group. I.e., abc,12 will get overridden by 30,asdf,2.

Related question:

  • Regular expression with variable number of groups?

The solution is to use one expression (something like \((.+?)\)) and use matcher.find to iterate over the matches.

like image 176
aioobe Avatar answered Sep 25 '22 08:09

aioobe


You can use regular expression like ([^,]+) in loop or just str.split(",") to get all elements at once. This version: str.split("\\s*,\\s*") even allows spaces.

like image 39
AlexR Avatar answered Sep 25 '22 08:09

AlexR