Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to replace invalid characters

Tags:

c#

regex

I don't have much experience with RegEx so I am using many chained String.Replace() calls to remove unwanted characters -- is there a RegEx I can write to streamline this?

string messyText = GetText();
string cleanText = messyText.Trim()
         .ToUpper()
         .Replace(",", "")
         .Replace(":", "")
         .Replace(".", "")
         .Replace(";", "")
         .Replace("/", "")
         .Replace("\\", "")
         .Replace("\n", "")
         .Replace("\t", "")
         .Replace("\r", "")
         .Replace(Environment.NewLine, "")
         .Replace(" ", "");

Thanks

like image 287
cordialgerm Avatar asked Oct 07 '10 21:10

cordialgerm


2 Answers

Try this regex:

Regex regex = new Regex(@"[\s,:.;/\\]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

\s is a character class equivalent to [ \t\r\n].


If you just want to preserve alphanumeric characters, instead of adding every non-alphanumeric character in existence to the character class, you could do this:

Regex regex = new Regex(@"[\W_]+");
string cleanText = regex.Replace(messyText, "").ToUpper();

Where \W is any non-word character (not [^a-zA-Z0-9_]).

like image 116
999999 Avatar answered Oct 13 '22 18:10

999999


Character classes to the rescue!

string messyText = GetText();
string cleanText = Regex.Replace(messyText.Trim().ToUpper(), @"[,:.;/\\\n\t\r ]+", "")
like image 29
kevingessner Avatar answered Oct 13 '22 18:10

kevingessner