The following code...
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var r = new Regex("(.*)");
var c = "XYZ";
var uc = r.Replace(c, "A $1 B");
Console.WriteLine(uc);
}
}
.Net Fiddle Link
produces the following output...
A XYZ BA B
Do you think this is correct?
Shouldn't the output be...
A XYZ B
I think I am doing something stupid here. I would appreciate any help you can provide in helping me understand this issue.
Here is something interesting...
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var r = new Regex("(.*)");
var c = "XYZ";
var uc = r.Replace(c, "$1");
Console.WriteLine(uc);
}
}
.Net Fiddle
Output...
XYZ
As for why the engine returns 2 matches, it is due to the way .NET (also Perl and Java) handles global matching, i.e. find all matches to the given pattern in an input string.
The process can be described as followed (current index is usually set to 0 at the beginning of a search, unless specified):
$0
) is non-empty (at least one character is consumed), add the result and set current index to the end of main match ($0
). Then go to step 1.$0
) is empty:
The engine needs to check for empty match; otherwise, it will end up in an infinite loop. The designer recognizes the usage of empty match (in splitting a string into characters, for example), so the engine must be designed to avoid getting stuck at a certain position forever.
This process explains why there is an empty match at the end: since a search is conducted at the end of the string (index 3) after (.*)
matches abc
, and (.*)
can match an empty string, an empty match is found. And the engine does not produce infinite number of empty matches, since an empty match has already been found at the end.
a b c
^ ^ ^ ^
0 1 2 3
First match:
a b c
^ ^
0-----3
Second match:
a b c
^
3
With the global matching algorithm above, there can only be at most 2 matches starting at the same index, and such case can only happen when the first one is an empty match.
Note that JavaScript simply increment current index by 1 if the main match is empty, so there is at most 1 match per index. However, in this case (.*)
, if you use global flag g
to do global matching, the same result would happen:
(Result below is from Firefox, note the g
flag)
> "XYZ".replace(/(.*)/g, "A $1 B")
"A XYZ BA B"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With