Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression named capturing groups support in Java 7

Tags:

Since Java 7 regular expressions API offers support for named capturing groups. The method java.util.regex.Matcher.group(String) returns the input subsequence captured by the given named-capturing group, but there's no example available on API documentations.

What is the right syntax to specify and retrieve a named capturing group in Java 7?

like image 468
Rafael Borja Avatar asked Dec 16 '14 05:12

Rafael Borja


People also ask

How do I reference a capture group in regex?

If your regular expression has named capturing groups, then you should use named backreferences to them in the replacement text. The regex (?' name'group) has one group called “name”. You can reference this group with ${name} in the JGsoft applications, Delphi, .

Can I use named capturing groups?

Mixing named and numbered capturing groups is not recommended because flavors are inconsistent in how the groups are numbered. If a group doesn't need to have a name, make it non-capturing using the (?:group) syntax. In . NET you can make all unnamed groups non-capturing by setting RegexOptions.

What does the name of group in Java?

The group method returns the matched input sequence captured by the previous match in the form of the string. This method returns the empty string when the pattern successfully matches the empty string in the input.

Which capturing group can represent the entire expression?

Which capturing group can represent the entire expression? Explanation: Group 0 is a special group which represents the entire expression.


2 Answers

Specifying named capturing group

Use the following regex with a single capturing group as an example ([Pp]attern).

Below are 4 examples on how to specify a named capturing group for the regex above:

(?<Name>[Pp]attern) (?<group1>[Pp]attern) (?<name>[Pp]attern) (?<NAME>[Pp]attern) 

Note that the name of the capturing group must strictly matches the following Pattern:

[A-Za-z][A-Za-z0-9]* 

The group name is case-sensitive, so you must specify the exact group name when you are referring to them (see below).

Backreference the named capturing group in regex

To back-reference the content matched by a named capturing group in the regex (correspond to 4 examples above):

\k<Name> \k<group1> \k<name> \k<NAME> 

The named capturing group is still numbered, so in all 4 examples, it can be back-referenced with \1 as per normal.

Refer to named capturing group in replacement string

To refer to the capturing group in replacement string (correspond to 4 examples above):

${Name} ${group1} ${name} ${NAME} 

Same as above, in all 4 examples, the content of the capturing group can be referred to with $1 in the replacement string.

Named capturing group in COMMENT mode

Using (?<name>[Pp]attern) as an example for this section.

Oracle's implementation of the COMMENT mode (embedded flag (?x)) parses the following examples to be identical to the regex above:

(?x)  (  ?<name>             [Pp] attern  ) (?x)  (  ?<  name  >         [Pp] attern  ) (?x)  (  ?<  n  a m    e  >  [Pp] attern  ) 

Except for ?< which must not be separated, it allows arbitrary spacing even in between the name of the capturing group.

Same name for different capturing groups?

While it is possible in .NET, Perl and PCRE to define the same name for different capturing groups, it is currently not supported in Java (Java 8). You can't use the same name for different capturing groups.

Named capturing group related APIs

New methods in Matcher class to support retrieving captured text by group name:

  • group(String name) (from Java 7)
  • start(String name) (from Java 8)
  • end(String name) (from Java 8)

The corresponding method is missing from MatchResult class as of Java 8. There is an on-going Enhancement request JDK-8065554 for this issue.

There is currently no API to get the list of named capturing groups in the regex. We have to jump through extra hoops to get it. Though it is quite useless for most purposes, except for writing a regex tester.

like image 66
nhahtdh Avatar answered Nov 04 '22 23:11

nhahtdh


The new syntax for a named capturing group is (?<name>X) for a matching group X named by "name". The following code captures the regex (\w+) (any group of alphanumeric characters). To name this capturing group you must add the expression ? inside the parentheses just before the regex to be captured.

Pattern compile = Pattern.compile("(?<teste>\\w+)"); Matcher matcher = compile.matcher("The first word is a match"); matcher.find(); String myNamedGroup= matcher.group("teste"); System.out.printf("This is yout named group: %s", myNamedGroup); 

This code returns prints the following output:

This is your named group: The

like image 36
Rafael Borja Avatar answered Nov 04 '22 21:11

Rafael Borja