Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex.Replace without line start and end terminators has some very strange effects.... What is going on here?

Tags:

c#

regex

replace

While answering this question C# Regex Replace and * the point was raised as to why the problem exists. When playing I produced the following code:

    string s = Regex.Replace(".A.", "\w*", "B");
    Console.Write(s);

This has the output: B.BB.B

I get that the 0 length string is match before and after the . character, but why is A replaced by 2 Bs.

I could understand B.BBB.B as replacing zero-length strings either side of A or B.B.B But the actual result confuses me - any help appreciated.

Or as AakashM has put it:

Why is Regex.Matches("A", "\w*").Count equal to 2, not 1 or 3 ?

like image 997
Matt Fellows Avatar asked Feb 10 '12 12:02

Matt Fellows


2 Answers

There is a star after \w

It means "zero or many" so that means:

  • First symbol is a dot, it is NOT \w so there is zero \w here, replace by B
  • Next we have a dot itself, which is not replaceable
  • A gets replaced by B
  • zero \w before the next dot, replace by B
  • dot, not replaceable
  • Line end, zero \w so replace by B again.

Expression \w{0,} will have the same effect.

If you want to avoid it, use 'plus' which means 'at least one': \w+

like image 198
Alexey Raga Avatar answered Sep 27 '22 21:09

Alexey Raga


Thats the same behaviour than

Regex.Replace("", "\w*", "B") results in B
Regex.Replace("A", "\w*", "B") results in BB

See it here on Regexr

For the string ".A." \w* matches before the first dot the empty string, then on the "A", after the "A" the empty string and after the last dot the empty string.

Explanation

You can think of the pattern eating the characters, \w* has eaten the "A", the next char is a dot, so this match is complete and replaced. But the start position for the pattern to continue matching is still between the A and the dot. The dot can not be matched, so it matches the empty string before the dot, but then this position is done and the next start position is after the dot.

like image 39
stema Avatar answered Sep 27 '22 21:09

stema