Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I access named capturing groups in a .NET Regex?

Tags:

c#

.net

regex

People also ask

How do you reference a named group in regex?

If a regex has multiple groups with the same name, backreferences using that name can match the text captured by any group with that name that appears to the left of the backreference in the regex. Substituted with the text matched by the named group “name”. (?

How do capture groups work regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

Can I use named capturing groups?

Mixing named and numbered capturing groups is not recommended because flavors are inconsistent in how the groups are numbered. If a group doesn't need to have a name, make it non-capturing using the (?:group) syntax. In . NET you can make all unnamed groups non-capturing by setting RegexOptions.

What is named capture group?

Some regular expression flavors allow named capture groups. Instead of by a numerical index you can refer to these groups by name in subsequent code, i.e. in backreferences, in the replace pattern as well as in the following lines of the program.


Use the group collection of the Match object, indexing it with the capturing group name, e.g.

foreach (Match m in mc){
    MessageBox.Show(m.Groups["link"].Value);
}

You specify the named capture group string by passing it to the indexer of the Groups property of a resulting Match object.

Here is a small example:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        String sample = "hello-world-";
        Regex regex = new Regex("-(?<test>[^-]*)-");

        Match match = regex.Match(sample);

        if (match.Success)
        {
            Console.WriteLine(match.Groups["test"].Value);
        }
    }
}

The following code sample, will match the pattern even in case of space characters in between. i.e. :

<td><a href='/path/to/file'>Name of File</a></td>

as well as:

<td> <a      href='/path/to/file' >Name of File</a>  </td>

Method returns true or false, depending on whether the input htmlTd string matches the pattern or no. If it matches, the out params contain the link and name respectively.

/// <summary>
/// Assigns proper values to link and name, if the htmlId matches the pattern
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetHrefDetails(string htmlTd, out string link, out string name)
{
    link = null;
    name = null;

    string pattern = "<td>\\s*<a\\s*href\\s*=\\s*(?:\"(?<link>[^\"]*)\"|(?<link>\\S+))\\s*>(?<name>.*)\\s*</a>\\s*</td>";

    if (Regex.IsMatch(htmlTd, pattern))
    {
        Regex r = new Regex(pattern,  RegexOptions.IgnoreCase | RegexOptions.Compiled);
        link = r.Match(htmlTd).Result("${link}");
        name = r.Match(htmlTd).Result("${name}");
        return true;
    }
    else
        return false;
}

I have tested this and it works correctly.


Additionally if someone have a use case where he needs group names before executing search on Regex object he can use:

var regex = new Regex(pattern); // initialized somewhere
// ...
var groupNames = regex.GetGroupNames();