Possible Duplicate:
Java regex anomaly?
any Idea why the following test fails (returns "xx" instead of "x")
@Test
public void testReplaceAll(){
assertEquals("x", "xyz".replaceAll(".*", "x"));
}
I don't want to do "^.*$"
.... I want to understand this behavior.
any clues?
Yes, it is exactly the same as described in this question!
.*
will first match the whole input, but then also an empty string at the end of the input...
Let's symbolize the regex engine with |
and the input with <...>
in your example.
<xyz>
;<|xyz>
;<xyz|>
(matched text: "xyz");<xyz>|
(matched text: "").Not all regex engines behave this way. Java does, however. So does perl. Sed, as a counterexample, will position its cursor after the end of the input in step 3.
Now, you also have to understand one crucial thing: regex engines, when they encounter a zero-length match, always advance one character. Otherwise, consider what would happen if you attempted to replace '^' with 'a': '^' matches a position, therefore is a zero-length match. If the engine didn't advance one character, "x" would be replaced with "ax", which would be replace with "aax", etc. So, after the second match, which is empty, Java's regex engine advances one "character"... Of which there aren't any: end of processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With