Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp grouping and replaceAll with .* in Java duplicates the replacement

Tags:

java

regex

I got a problem using Rexexp in Java. The example code writes out ABC_012_suffix_suffix, I was expecting it to output ABC_012_suffix

    Pattern rexexp  = Pattern.compile("(.*)");
    Matcher matcher = rexexp.matcher("ABC_012");
    String  result  = matcher.replaceAll("$1_suffix");

    System.out.println(result);

I understand that replaceAll replaces all matched groups, the questions is why is this regexp group (.*) matching twice on my string ABC_012 in Java?

like image 331
UnixShadow Avatar asked Feb 17 '11 12:02

UnixShadow


1 Answers

Pattern regexp  = Pattern.compile(".*");
Matcher matcher = regexp.matcher("ABC_012");
matcher.matches();
System.out.println(matcher.group(0));
System.out.println(matcher.replaceAll("$0_suffix"));

Same happens here, the output is:

ABC_012
ABC_012_suffix_suffix

The reason is hidden in the replaceAll method: it tries to find all subsequences that match the pattern:

while (matcher.find()) {
  System.out.printf("Start: %s, End: %s%n", matcher.start(), matcher.end());
}

This will result in:

Start: 0, End: 7
Start: 7, End: 7

So, to our first surprise, the matcher finds two subsequences, "ABC_012" and another "". And it appends "_suffix" to both of them:

"ABC_012" + "_suffix" + "" + "_suffix"
like image 62
Andreas Dolk Avatar answered Oct 16 '22 10:10

Andreas Dolk