This question is, in a way, continuation of my previously answered question: Getting "Unterminated [] set." Error in C#
I'm using regular expression in C# to extract URLs:
Regex find = new Regex(@"(?<First>[,""]url=)(?<Url>[^\\]+)(?<Last>\\u00)");
Where the text contains URLs in the format:
,url=http://domain.com?itag=25\u0026,url=http://hello.com?itag=11\u0026
I'm getting the entire URL in 'Url' group, but I'd also like to have the itag value in a separate "iTag" group. I know this can be done using sub-groups and I've been trying but can't figure out exactly how to do this.
What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters ‘c’, ‘a’, and ‘t’.
where X is the regular expression pattern that you want to capture It's called a named-capturing group. Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ( (A) (B (C))), for example, there are the following groups: 0 - Group zero always stands for the entire expression - ( (A) (B (C)))
Python Regex Capturing Groups 1 Example to Capture multiple groups. To extract the uppercase word and number from the target string we must first write two regular expression patterns. 2 Regex capture group multiple times. In earlier examples, we used the search method. ... 3 Extract range of groups matches. ...
Groups are numbered in regex engines, starting with 1. Traditionally, the maximum group number is 9, but many modern regex flavors support higher group counts. Group 0 always matches the entire pattern, the same way surrounding the entire regex with brackets would.
You already have named groups defined in the Regex. The syntax ?<First>
is naming everything within those parenthesis First
.
When you match using Regex
, using the Groups
property to access the GroupCollection
and extract a group value by name.
var first = regex.Match(line).Groups["First"].Value;
This will add an additional group for iTag, but retain the full Url. Move it outside the other parenthesis to change this.
(?<First>[,""]url=)(?<Url>[^\?]+?itag=(?<iTag>[0-9]*))(?<Last>\\u0026)
Here's the code.
Regex regex = new Regex("(?<First>[,\"]url=)(?<Url>[^\\?]*\\?itag=(?<iTag>[0-9]*))(?<Last>\\u0026)");
string input = ",url=http://domain.com?itag=25\u0026,url=http://hello.com?itag=11\u0026";
foreach(Match match in regex.Matches(input))
{
System.Console.WriteLine("1. "+match);
System.Console.WriteLine(" 1. "+match.Groups["First"]);
System.Console.WriteLine(" 2. "+match.Groups["Url"]);
System.Console.WriteLine(" 3. "+match.Groups["iTag"]);
System.Console.WriteLine(" 4. "+match.Groups["Last"]);
}
Results:
1. ,url=http://domain.com?itag=25&
1. ,url=
2. http://domain.com?itag=25
3. 25
4. &
1. ,url=http://hello.com?itag=11&
1. ,url=
2. http://hello.com?itag=11
3. 11
4. &
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With