Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Match any punctuation character except . and _

Tags:

c#

.net

regex

Is there an easy way to match all punctuation except period and underscore, in a C# regex? Hoping to do it without enumerating every single punctuation mark.

like image 815
Smashery Avatar asked Oct 19 '10 23:10

Smashery


People also ask

How do you match a character except one RegEx?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).

What does ?= * Mean in RegEx?

. means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it.

How do you use punctuation in regular expressions?

Some punctuation has special meaning in RegEx. It can get confusing if you are searching for things question marks, periods, and parentheses. For example, a period means “match any character.” The easiest way to get around this is to “escape” the character.


1 Answers

Use Regex Subtraction

[\p{P}-[._]] 

Here's the link for .NET Regex documentation (I'm not sure if other flavors support it)... http://msdn.microsoft.com/en-us/library/ms994330.aspx

Here's a C# example

string pattern = @"[\p{P}\p{S}-[._]]"; // added \p{S} to get ^,~ and ` (among others) string test = @"_""'a:;%^&*~`bc!@#.,?"; MatchCollection mx = Regex.Matches(test, pattern); foreach (Match m in mx) {     Console.WriteLine("{0}: {1} {2}", m.Value, m.Index, m.Length); } 

Explanation The pattern is a Character Class Subtraction. It starts with a standard character class like [\p{P}] and then adds a Subtraction Character Class like -[._] which says to remove the . and _. The subtraction is placed inside the [ ] after the standard class guts.

like image 102
Les Avatar answered Sep 16 '22 14:09

Les