In order to re-produce the problem as stated in a recent question - Why does (.*)* make two matches and select nothing in group $1? I tried various combination of *
and +
, inside and outside the brackets, and the result I got was not expected.
I would have expected the output, same as one explained in the accepted answer in that question, and also in another duplicate question, tagged under Perl
- Why doesn't the .* consume the entire string in this Perl regex? . But it's not behaving the same way.
To make it simple, here's the code I tried: -
String str = "input";
String[] patterns = { "(.*)*", "(.*)+", "(.+)*", "(.+)+" };
for (String pattern: patterns) {
Matcher matcher = Pattern.compile(pattern).matcher(str);
while (matcher.find()) {
System.out.print("'" + matcher.group(1) + "' : '" + matcher.start() + "'" + "\t");
}
System.out.println();
}
And this is the output I got for all the 4 combination: -
'' : '0' '' : '5' // For `(.*)*`
'' : '0' '' : '5' // For `(.*)+`
'input' : '0' 'null' : '5' // For `(.+)*`
'input' : '0' // For `(.+)+`
Now, What I can't understand, why in 1st
and 2nd
output, I am not getting the entire string as first result
for matcher.find()
. I mean, ideally, in 1st case, .*
should first capture the entire string, and then also capture the empty string
at the end. Now, although it is giving expected result for 2nd match, it's not behaving well for 1st match
.
And also, in 2nd case, I should not even get the 2nd match, because I'm having a +
quantifier outside the bracket.
My expected output is: -
'input' : '0' '' : '5' // For 1st
'input' : '0' // For 2nd
Also, in the 3rd
output, why I got null
as 2nd match instead of empty string
? Shouldn't the 2nd match for first 3 combination be same?
4th output is as per expectation. So, no doubt in that.
You're seeing the effect of the same phenomenon you see in the question you linked to:
For (.*)*
:
matcher.start()
is 0
because that's where the match ("input"
) starts.matcher.group(1)
is ""
because the repeated (.*)
has overwritten the captured "input"
with the empty string (but matcher.group(0)
does contain input"
).matcher.start()
is 5
because that's where the regex engine is after the first successful match.matcher.group(1)
(as well as matcher.group(0)
) is ""
because that's all there was to match at the end of the string.For (.*)+
it's the same. After all, the empty string can be repeated as many times as you want and still be the empty string.
For (.+)*
you get null
because while the second match succeeds (zero repetitions of a string of length 1 matches the empty string), the capturing parentheses haven't been able to capture anything, so its contents are null
(as in undefined, instead of the empty string).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With