Can anyone tell me why
System.out.println("test".replaceAll(".*", "a"));
Results in
aa
Note that the following has the same result:
System.out.println("test".replaceAll(".*$", "a"));
I have tested this on java 6 & 7 and both seem to behave the same way. Am I missing something or is this a bug in the java regex engine?
replaceAll() The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function to be called for each match. The original string is left unchanged.
The difference between replace() and replaceAll() method is that the replace() method replaces all the occurrences of old char with new char while replaceAll() method replaces all the occurrences of old string with the new string.
public String replaceAll(String regex, String replacement) The replaceAll() method replaces each substring of this string that matches the given regular expression with the given replacement.
This is not an anomaly: .*
can match anything.
You ask to replace all occurrences:
.*
also matches an empty string! It therefore matches an empty string at the end of the input, and replaces it with a
.Using .+
instead will not exhibit this problem since this regex cannot match an empty string (it requires at least one character to match).
Or, use .replaceFirst()
to only replace the first occurrence:
"test".replaceFirst(".*", "a") ^^^^^^^^^^^^
Now, why .*
behaves like it does and does not match more than twice (it theoretically could) is an interesting thing to consider. See below:
# Before first run regex: |.* input: |whatever # After first run regex: .*| input: whatever| #before second run regex: |.* input: whatever| #after second run: since .* can match an empty string, it it satisfied... regex: .*| input: whatever| # However, this means the regex engine matched an empty input. # All regex engines, in this situation, will shift # one character further in the input. # So, before third run, the situation is: regex: |.* input: whatever<|ExhaustionOfInput> # Nothing can ever match here: out
Note that, as @A.H. notes in the comments, not all regex engines behave this way. GNU sed
for instance will consider that it has exhausted the input after the first match.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With