Can anyone explain:
public void testGroups() throws Exception
{
String TEST_STRING = "After Yes is group 1 End";
{
Pattern p;
Matcher m;
String pattern="(?:Yes|No)(.*)End";
p=Pattern.compile(pattern);
m=p.matcher(TEST_STRING);
boolean f=m.find();
int count=m.groupCount();
int start=m.start(1);
int end=m.end(1);
System.out.println("Pattern=" + pattern + "\t Found=" + f + " Group count=" + count +
" Start of group 1=" + start + " End of group 1=" + end );
}
{
Pattern p;
Matcher m;
String pattern="(?:Yes)|(?:No)(.*)End";
p=Pattern.compile(pattern);
m=p.matcher(TEST_STRING);
boolean f=m.find();
int count=m.groupCount();
int start=m.start(1);
int end=m.end(1);
System.out.println("Pattern=" + pattern + "\t Found=" + f + " Group count=" + count +
" Start of group 1=" + start + " End of group 1=" + end );
}
}
Which gives the following output:
Pattern=(?:Yes|No)(.*)End Found=true Group count=1 Start of group 1=9 End of group 1=21
Pattern=(?:Yes)|(?:No)(.*)End Found=true Group count=1 Start of group 1=-1 End of group 1=-1
Java Matcher group() Method The group method returns the matched input sequence captured by the previous match in the form of the string. This method returns the empty string when the pattern successfully matches the empty string in the input.
Pattern matcher() method in Java with examples The matcher() method of this class accepts an object of the CharSequence class representing the input string and, returns a Matcher object which matches the given string to the regular expression represented by the current (Pattern) object.
Difference between matches() and find() in Java Regex The matches() method returns true If the regular expression matches the whole text. If not, the matches() method returns false. Whereas find() search for the occurrence of the regular expression passes to Pattern.
Matcher Class − A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher() method on a Pattern object.
The difference is that in the second pattern "(?:Yes)|(?:No)(.*)End"
, the concatenation ("X followed by Y" in "XY") has higher precedence than the choice ("Either X or Y" in "X|Y"), like multiplication has higher precedence than addition, so the pattern is equivalent to
"(?:Yes)|(?:(?:No)(.*)End)"
What you wanted to get is the following pattern:
"(?:(?:Yes)|(?:No))(.*)End"
This yields the same output as your first pattern.
In your test, the second pattern has the group 1 at the (empty) range [-1, -1[
because that group did not match (the start -1 is included, the end -1 is excluded, making the half-open interval empty).
A capturing group is a group that may capture input. If it captures, one also says it matches some substring of the input. If the regex contains choices, then not every capturing group may actually capture input, so there may be groups that do not match even if the regex matches.
The group count, as returned by Matcher.groupCount()
, is gained purely by counting the grouping brackets of capturing groups, irrespective of whether any of them could match on any given input. Your pattern has exactly one capturing group: (.*)
. This is group 1. The documentation states:
(?:X) X, as a non-capturing group
and explains:
Groups beginning with
(?
are either pure, non-capturing groups that do not capture text and do not count towards the group total, or named-capturing group.
Whether any specific group matches on a given input, is irrelevant for that definition. E.g., in the pattern (Yes)|(No)
, there are two groups ((Yes)
is group 1, (No)
is group 2), but only one of them can match for any given input.
The call to Matcher.find()
returns true if the regex was matched on some substring. You can determine which groups matched by looking at their start: If it is -1, then the group did not match. In that case, the end is at -1, too. There is no built-in method that tells you how many capturing groups actually matched after a call to find()
or match()
. You'd have to count these yourself by looking at each group's start.
When it comes to backreferences, also note what the regex tutorial has to say:
There is a difference between a backreference to a capturing group that matched nothing, and one to a capturing group that did not participate in the match at all.
To summarise,
1) The two patterns give different results because of the precedence rules of the operators.
(?:Yes|No)(.*)End
matches (Yes or
No) followed by .*End (?:Yes)|(?:No)(.*)End
matches (Yes)
or (No followed by .*End)2) The second pattern gives a group count of 1 but a start and end of -1 because of the (not necessarily intuitive) meanings of the results returned by the Matcher
method calls.
Matcher.find()
returns true if a match was found. In your case the match was on the (?:Yes)
part of the pattern.Matcher.groupCount()
returns the number of capturing groups in the pattern regardless of whether the capturing groups actually participated in the match. In your case only the non capturing (?:Yes)
part of the pattern participated in the match, but the capturing (.*)
group was still part of the pattern so the group count is 1.Matcher.start(n)
and Matcher.end(n)
return the start and end index of the subsequence matched by the n th capturing group. In your case, although an overall match was found, the (.*)
capturing group did not participate in the match and so did not capture a subsequence, hence the -1 results. 3) (Question asked in comment.) In order to determine how many capturing groups actually captured a subsequence, iterate Matcher.start(n)
from 0 to Matcher.groupCount()
counting the number of non -1 results. (Note that Matcher.start(0)
is the capturing group representing the whole pattern, which you may want to exclude for your purposes.)
Due to the precedence of the "|" operator in the pattern, the second pattern is equivalent to:
(?:Yes)|((?:No)(.*)End)
What you want is
(?:(?:Yes)|(?:No))(.*)End
When using regular expression is it important to remember there there is an implicit AND
operator at work. This can be seen from the JavaDoc for java.util.regex.Pattern
covering the logical operators:
Logical operators
XY X followed by Y
X|Y Either X or Y
(X) X, as a capturing group
This AND
takes precedence over the OR
in the second Pattern. The second Pattern is equivalent to (?:Yes)|(?:(?:No)(.*)End)
.
In order for it to be equivalent to the first Pattern it must be changed to (?:(?:Yes)|(?:No))(.*)End
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With