Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behavior in regexes

There was a question about regex and trying to answer I found another strange things.

String x = "X";
System.out.println(x.replaceAll("X*", "Y"));

This prints YY. why??

String x = "X";
System.out.println(x.replaceAll("X*?", "Y"));

And this prints YXY

Why reluctant regex doesn't match 'X' character? There is "noting"X"nothing" but why first doesn't match three symbols and matches two and then one instead of three? and second regex matches only "nothing"s and not X?

like image 514
shift66 Avatar asked Jan 18 '23 12:01

shift66


2 Answers

Let's consider them in turn:

"X".replaceAll("X*", "Y")

There are two matches:

  1. At character position 0, X is matched, and is replaced with Y.
  2. At character position 1, the empty string is matched, and Y gets added to the output.

End result: YY.

"X".replaceAll("X*?", "Y")

There are also two matches:

  1. At character position 0, the empty string is matched, and Y gets added to the output. The character at this position, X, was not consumed by the match, and is therefore copied into the output verbatim.
  2. At character position 1, the empty string is matched, and Y gets added to the output.

End result: YXY.

like image 107
NPE Avatar answered Jan 25 '23 23:01

NPE


The * is a tricky 'quantifier' since it means '0 or more'. Thus, it also matches '0 times X' (i.e. an empty string).

I would use

"X".replaceAll("X+", "Y")

which has the expected behaviour.

like image 29
Willem Mulder Avatar answered Jan 25 '23 21:01

Willem Mulder