Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get group names in java regex

Tags:

java

regex

I'm trying to receive both a pattern & a string and return a map of group name -> matched result.

Example:

(?<user>.*) 

I would like to return for a map containing "user" as a key and whatever it matches as its value.

the problem is that I can't seem to get the group name from the Java regex api. I can only get the matched values by name or by index. I don't have the list of group names and neither Pattern nor Matcher seem to expose this information. I have checked its source and it seems as if the information is there - it's just not exposed to the user.

I tried both Java's java.util.regex and jregex. (and don't really care if someone suggested any other library that is good, supported & high in terms performance that supports this feature).

like image 317
Roy Reznik Avatar asked Mar 23 '13 16:03

Roy Reznik


People also ask

What is matcher group in Java?

Matcher group() method in Java with ExamplesMatcher class represents an engine that performs various match operations. There is no constructor for this class, you can create/obtain an object of this class using the matches() method of the class java. util. regex. Pattern.

How to use named group in regular expression?

Use named group in regular expression. Regex expression = new Regex ( @"Left (?<middle>\d+)Right" ); // ... See if we matched.

Does Java 7 regex support named groups?

As geofflane mentions in his answer, Java 7 now support named groups. tchrist points out in the comment that the support is limited. Java 7 regex named group support was presented back in September 2010 in Oracle's blog. In the official release of Java 7, the constructs to support the named capturing group are:

How to map group names to group numbers in Java?

This is the second easy approach to the problem: we will call the non-public method namedGroups () in Pattern class to obtain a Map<String, Integer> that maps group names to the group numbers via Java Reflection API. The advantage of this approach is that we don't need a string that contains a match to the regex to find the exact named groups.

What is capturing group in Java regex?

Java Regex - Capturing Groups. Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".


1 Answers

There is no API in Java to obtain the names of the named capturing groups. I think this is a missing feature.

The easy way out is to pick out candidate named capturing groups from the pattern, then try to access the named group from the match. In other words, you don't know the exact names of the named capturing groups, until you plug in a string that matches the whole pattern.

The Pattern to capture the names of the named capturing group is \(\?<([a-zA-Z][a-zA-Z0-9]*)> (derived based on Pattern class documentation).

(The hard way is to implement a parser for regex and get the names of the capturing groups).

A sample implementation:

import java.util.Scanner; import java.util.Set; import java.util.TreeSet; import java.util.Iterator; import java.util.regex.Pattern; import java.util.regex.Matcher; import java.util.regex.MatchResult;  class RegexTester {      public static void main(String args[]) {         Scanner scanner = new Scanner(System.in);          String regex = scanner.nextLine();         StringBuilder input = new StringBuilder();         while (scanner.hasNextLine()) {             input.append(scanner.nextLine()).append('\n');         }          Set<String> namedGroups = getNamedGroupCandidates(regex);          Pattern p = Pattern.compile(regex);         Matcher m = p.matcher(input);         int groupCount = m.groupCount();          int matchCount = 0;          if (m.find()) {             // Remove invalid groups             Iterator<String> i = namedGroups.iterator();             while (i.hasNext()) {                 try {                     m.group(i.next());                 } catch (IllegalArgumentException e) {                     i.remove();                 }             }              matchCount += 1;             System.out.println("Match " + matchCount + ":");             System.out.println("=" + m.group() + "=");             System.out.println();             printMatches(m, namedGroups);              while (m.find()) {                 matchCount += 1;                 System.out.println("Match " + matchCount + ":");                 System.out.println("=" + m.group() + "=");                 System.out.println();                 printMatches(m, namedGroups);             }         }     }      private static void printMatches(Matcher matcher, Set<String> namedGroups) {         for (String name: namedGroups) {             String matchedString = matcher.group(name);             if (matchedString != null) {                 System.out.println(name + "=" + matchedString + "=");             } else {                 System.out.println(name + "_");             }         }          System.out.println();          for (int i = 1; i < matcher.groupCount(); i++) {             String matchedString = matcher.group(i);             if (matchedString != null) {                 System.out.println(i + "=" + matchedString + "=");             } else {                 System.out.println(i + "_");             }         }          System.out.println();     }      private static Set<String> getNamedGroupCandidates(String regex) {         Set<String> namedGroups = new TreeSet<String>();          Matcher m = Pattern.compile("\\(\\?<([a-zA-Z][a-zA-Z0-9]*)>").matcher(regex);              while (m.find()) {                 namedGroups.add(m.group(1));             }              return namedGroups;         }     } } 

There is a caveat to this implementation, though. It currently doesn't work with regex in Pattern.COMMENTS mode.

like image 69
nhahtdh Avatar answered Oct 02 '22 19:10

nhahtdh