Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to remove all special characters from string?

Tags:

string

c#

regex

I'm completely incapable of regular expressions, and so I need some help with a problem that I think would best be solved by using regular expressions.

I have list of strings in C#:

List<string> lstNames = new List<string>(); lstNames.add("TRA-94:23"); lstNames.add("TRA-42:101"); lstNames.add("TRA-109:AD");  foreach (string n in lstNames) {   // logic goes here that somehow uses regex to remove all special characters   string regExp = "NO_IDEA";   string tmp = Regex.Replace(n, regExp, ""); } 

I need to be able to loop over the list and return each item without any special characters. For example, item one would be "TRA9423", item two would be "TRA42101" and item three would be TRA109AD.

Is there a regular expression that can accomplish this for me?

Also, the list contains more than 4000 items, so I need the search and replace to be efficient and quick if possible.

EDIT: I should have specified that any character beside a-z, A-Z and 0-9 is special in my circumstance.

like image 860
Jagd Avatar asked Jul 21 '10 20:07

Jagd


People also ask

How do I remove a specific character from a string in regex?

If you are having a string with special characters and want's to remove/replace them then you can use regex for that. Use this code: Regex. Replace(your String, @"[^0-9a-zA-Z]+", "")

What is the regex for special characters?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).


2 Answers

It really depends on your definition of special characters. I find that a whitelist rather than a blacklist is the best approach in most situations:

tmp = Regex.Replace(n, "[^0-9a-zA-Z]+", ""); 

You should be careful with your current approach because the following two items will be converted to the same string and will therefore be indistinguishable:

"TRA-12:123" "TRA-121:23" 
like image 154
Mark Byers Avatar answered Oct 13 '22 10:10

Mark Byers


This should do it:

[^a-zA-Z0-9] 

Basically it matches all non-alphanumeric characters.

like image 37
Daniel Egeberg Avatar answered Oct 13 '22 09:10

Daniel Egeberg