I have written this very straight forward regex code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace RegexTest1
{
class Program
{
static void Main(string[] args)
{
string a = "\"foobar123==\"";
Regex r = new Regex("^\"(.*)\"$");
Match m = r.Match(a);
if (m.Success)
{
foreach (Group g in m.Groups)
{
Console.WriteLine(g.Index);
Console.WriteLine(g.Value);
}
}
}
}
}
However the output is
0 "foobar123==" 1 foobar123==
I don't understand why does it print twice. why should there be a capture at index 0? when I say in my regex ^\"
and I am not using capture for this.
Sorry if this is very basic but I don't write Regex on a daily basis.
According to me, this code should print only once and the index should be 1 and the value should be foobar==
This happens because group zero is special: it returns the entire match.
From the Regex documentation (emphasis added):
A simple regular expression pattern illustrates how numbered (unnamed) and named groups can be referenced either programmatically or by using regular expression language syntax. The regular expression
((?<One>abc)\d+)?(?<Two>xyz)(.*)
produces the following capturing groups by number and by name. The first capturing group (number 0) always refers to the entire pattern.
# Name Group
- ---------------- --------------------------------
0 0 (default name) ((?<One>abc)\d+)?(?<Two>xyz)(.*)
1 1 (default name) ((?<One>abc)\d+)
2 2 (default name) (.*)
3 One (?<One>abc)
4 Two (?<Two>xyz)
If you do not want to see it, start the output from the first group.
A regex captures several groups at once. Group 0
is the entire matched region (including the accents). Group 1
is the group defined by the brackets.
Say your regex has the following form:
A(B(C)D)E.
With A
, B
, C
, D
end E
regex expressions.
Then the following groups will be matched:
0 A(B(C)D)E
1 B(C)D
2 C
The i
-th group starts at the i
-th open bracket. And you can say the "zero-th" open bracket is implicitly placed at the begin of the regex (and ends at the end of the regex).
If you want to omit group 0
, you can use the Skip
method of the LINQ framework:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace RegexTest1 {
class Program {
static void Main(string[] args) {
string a = "\"foobar123==\"";
Regex r = new Regex("^\"(.*)\"$");
Match m = r.Match(a);
if (m.Success) {
foreach (Group g in m.Groups.Skip(1)) {//Skipping the first (thus group 0)
Console.WriteLine(g.Index);
Console.WriteLine(g.Value);
}
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With