Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programming error leads to inexplanable regex

Tags:

java

regex

for a test I created following regex by mistake:

|(\\w+)|

I was puzzled that this regex really works and I can't explain the result:

 public static void main(String[] args) {
    String toReplace="Hey I'm a lovely String an I'm giving my |value| worth!";
 // String replacement1="2 cent"; // I planned to replace |value| with 2 cent
   String replacement1="@"; // to produce a better Output
   String regex="|(\\w+)|"; // I forgot to escape the | 
    replacement1="@";
    result=toReplace.replaceAll(regex,replacement1);
    System.out.println(result);
}

the result is:

@H@e@y@ @I@'@m@ @a@ @l@o@v@e@l@y@ @S@t@r@i@n@g@ @a@n@ @I@'@m@ @g@i@v@i@n@g@ @m@y@ @|@v@a@l@u@e@|@ @w@o@r@t@h@!@

My ideas so far are that java tries to replace "nothing" between the characters but why not the characters itself?

\\w+ should match the 'H'

I would expect that every char is replaced by 3 @ signs or only by one but that the characters are not replaced puzzles me.

like image 992
Joachim Weiß Avatar asked May 05 '15 12:05

Joachim Weiß


1 Answers

You're right, this regex matches the empty string between each character.

Since the first alternative (the empty string left of |) matches, the rest of the pattern isn't even tried, so the \w+ isn't even reached by the matching engine. You could have written any (valid) pattern to the right of that first |, it wouldn't ever be reached.

The engine works the following way: It has a current position cursor in the subject string. It tries to match starting at that current position. Since your regex is a match, it will perform the replacement at this point, and then move the current position cursor after the found match.

But since the match is zero-width, it simply advances to the next character, because not doing so would result in an infinite loop.

like image 148
Lucas Trzesniewski Avatar answered Sep 17 '22 15:09

Lucas Trzesniewski