I got a problem using Rexexp in Java. The example code writes out ABC_012_suffix_suffix
, I was expecting it to output ABC_012_suffix
Pattern rexexp = Pattern.compile("(.*)");
Matcher matcher = rexexp.matcher("ABC_012");
String result = matcher.replaceAll("$1_suffix");
System.out.println(result);
I understand that replaceAll replaces all matched groups, the questions is why is this regexp group (.*)
matching twice on my string ABC_012
in Java?
Pattern regexp = Pattern.compile(".*");
Matcher matcher = regexp.matcher("ABC_012");
matcher.matches();
System.out.println(matcher.group(0));
System.out.println(matcher.replaceAll("$0_suffix"));
Same happens here, the output is:
ABC_012
ABC_012_suffix_suffix
The reason is hidden in the replaceAll
method: it tries to find
all subsequences that match the pattern:
while (matcher.find()) {
System.out.printf("Start: %s, End: %s%n", matcher.start(), matcher.end());
}
This will result in:
Start: 0, End: 7
Start: 7, End: 7
So, to our first surprise, the matcher finds two subsequences, "ABC_012"
and another ""
. And it appends "_suffix"
to both of them:
"ABC_012" + "_suffix" + "" + "_suffix"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With