Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Regex: Named Group Valid Characters?

Tags:

c#

regex

What constitutes a valid group name?

var re = new Regex(@"(?<what-letters-can-go-here>pattern)");
like image 890
mpen Avatar asked Nov 24 '10 21:11

mpen


2 Answers

Short Answer

The allowed characters are [a-zA-Z0-9_]

Long Answer

According to the Microsoft docs:

name must not contain any punctuation characters and cannot begin with a number.

But that's not very specific, so let's look at the source code:

The source code for the class System.Text.RegularExpressions.RegexParser shows us that the allowed characters are essentially [a-zA-Z0-9_]. To be really precise though, there is this comment in the method that is used to check if the character is valid for a capturing group name:

internal static bool IsWordChar(char ch) {
        // According to UTS#18 Unicode Regular Expressions (http://www.unicode.org/reports/tr18/)
        // RL 1.4 Simple Word Boundaries  The class of <word_character> includes all Alphabetic
        // values from the Unicode character database, from UnicodeData.txt [UData], plus the U+200C
        // ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER.
        return CharInClass(ch, WordClass) || ch == ZeroWidthJoiner || ch == ZeroWidthNonJoiner;
    }

And if you want to test it out yourself, this .NET fiddle confirms that there are many non-punctuation characters that are not allowed in the name of a capturing group:

like image 150
Josh Withee Avatar answered Oct 07 '22 22:10

Josh Withee


Anything matched by \w which is effectively [a-zA-Z0-9_]

Not confirmed however..

like image 28
Paul Creasey Avatar answered Oct 07 '22 23:10

Paul Creasey