Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp metachars "*" and "*?" in the JAVA's replaceAll() method behave oddly [duplicate]

Possible Duplicate:
String.replaceAll() anomaly with greedy quantifiers in regex
Strange behavior in regexes

While

"a".replaceAll("a", "b")
"a".replaceAll("a+", "b")
"a".replaceAll("a+?", "b")

all return b, why does

"a".replaceAll("a*", "b")

return bb and

"a".replaceAll("a*?", "b")

return bab?

like image 576
sp00m Avatar asked Feb 04 '13 17:02

sp00m


People also ask

What do the replaceAll () do?

The replaceAll() method returns a new string with all matches of a pattern replaced by a replacement . The pattern can be a string or a RegExp , and the replacement can be a string or a function to be called for each match.

What is the difference between Replace () and replaceAll ()?

The only difference between them is that it replaces the sub-string with the given string for all the occurrences present in the string. Syntax: The syntax of the replaceAll() method is as follows: public String replaceAll(String str, String replacement)

Does replaceAll use regex?

The method replaceAll() replaces all occurrences of a String in another String matched by regex. This is similar to the replace() function, the only difference is, that in replaceAll() the String to be replaced is a regex while in replace() it is a String.

What does replaceAll \\ s+ do?

The replaceAll() method finds single whitespace characters and replaces each match with an underscore. We have eleven whitespace characters in the input text.


2 Answers

"a".replaceAll("a*", "b")

First replaces a to b, then advances the pointer past the b. Then it matches the end of string, and replaces with b. Since it matched an empty string, it advances the pointer, falls out of the string, and finishes, resulting in bb.

"a".replaceAll("a*?", "b")

first matches the start of string and replaces with b. It doesn't match the a because ? in a*? means "non-greedy" (match as little as possible). Since it matched an empty string, it advances the pointer, skipping a. Then it matches the end of string, replaces with b and falls out of the string, resulting in bab. The end result is the same as if you did "a".replaceAll("", "b").

like image 119
John Dvorak Avatar answered Sep 29 '22 14:09

John Dvorak


This happens because of zero-width matches.


"a".replaceAll("a*", "b")

Will match two times:

  1. Try match at beginning of the string, greedy * consumes the a as a match.
  2. Advance to the next position in the string (now at end of string), try match there, empty string matches.

    " a "
     \| \___ 2. match empty string
      \_____ 1. match "a"
    


"a".replaceAll("a*?", "b")

Will also match two times:

  1. Try match at beginning of the string, non-greedy *? matches the empty string without consuming the a.
  2. Advance to next position in the string (now at end of string), try match there, empty string matches.

    " a "
     \  \___ 2. match empty string
      \_____ 1. match empty string
    
like image 22
Qtax Avatar answered Sep 29 '22 16:09

Qtax