Is there an easy way to match all punctuation except period and underscore, in a C# regex? Hoping to do it without enumerating every single punctuation mark.
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).
. means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it.
Some punctuation has special meaning in RegEx. It can get confusing if you are searching for things question marks, periods, and parentheses. For example, a period means “match any character.” The easiest way to get around this is to “escape” the character.
Use Regex Subtraction
[\p{P}-[._]]
Here's the link for .NET Regex documentation (I'm not sure if other flavors support it)... http://msdn.microsoft.com/en-us/library/ms994330.aspx
Here's a C# example
string pattern = @"[\p{P}\p{S}-[._]]"; // added \p{S} to get ^,~ and ` (among others) string test = @"_""'a:;%^&*~`bc!@#.,?"; MatchCollection mx = Regex.Matches(test, pattern); foreach (Match m in mx) { Console.WriteLine("{0}: {1} {2}", m.Value, m.Index, m.Length); }
Explanation The pattern is a Character Class Subtraction. It starts with a standard character class like [\p{P}] and then adds a Subtraction Character Class like -[._] which says to remove the . and _. The subtraction is placed inside the [ ] after the standard class guts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With