Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find out which group matches in Java regex without linear search?

I have some programmatically assembled huge regex, like this

(A)|(B)|(C)|...

Each sub-pattern is in its capturing group. When I get a match, how do I figure out which group matches without linearly testing each group(i) to see it returns a non-null string?

like image 886
Fortepianissimo Avatar asked Jul 24 '09 16:07

Fortepianissimo


People also ask

How do you check if a string matches a regex in Java?

To check if a String matches a Pattern one should perform the following steps: Compile a String regular expression to a Pattern, using compile(String regex) API method of Pattern. Use matcher(CharSequence input) API method of Pattern to create a Matcher that will match the given String input against this pattern.

WHAT IS group in regex Java?

Advertisements. Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

Which pattern is used to match any non What character in Java?

All the characters other than the English alphabet (both cases) and, digits (0 to 9) are considered as non-word characters. You can match them using the meta character “\W”.

Does Matcher class matches the regular expression against the text provided?

It is used to create a matcher that will match the given input against this pattern. 5. It is used to compile the given regular expression and attempts to match the given input against it.


2 Answers

If your regex is programmatically generated, why not programmatically generate n separate regexes and test each of them in turn? Unless they share a common prefix and the Java regex engine is clever, all alternatives get tested anyway.

Update: I just looked through the Sun Java source, in particular, java.util.regex.Pattern$Branch.match(), and that does also simply do a linear search over all alternatives, trying each in turn. The other places where Branch is used do not suggest any kind of optimization of common prefixes.

like image 51
Thomas Avatar answered Sep 28 '22 15:09

Thomas


You can use non-capturing groups, instead of:

(A)|(B)|(C)|...

replace with

((?:A)|(?:B)|(?:C))

The non-capturing groups (?:) will not be included in the group count, but the result of the branch will be captured in the outer () group.

like image 24
jwiebe Avatar answered Sep 28 '22 15:09

jwiebe