How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?
I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).
"Hello there(hello#)".Replace(regex-i-want, "");
should give
"Hellotherehello"
I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", "");
but the spaces remain.
A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .
To remove all non-alphanumeric characters from a string, call the replace() method, passing it a regular expression that matches all non-alphanumeric characters as the first parameter and an empty string as the second. The replace method returns a new string with all matches replaced.
The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.
In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace()
which I had overlooked completely...):
result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");
should work. The +
makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.
If you want to keep non-ASCII letters/digits, too, use the following regex:
@"[^\p{L}\p{N}]+"
which leaves
BonjourmesélèvesGutenMorgenliebeSchüler
instead of
BonjourmeslvesGutenMorgenliebeSchler
You can use Linq to filter out required characters:
String source = "Hello there(hello#)";
// "Hellotherehello"
String result = new String(source
.Where(ch => Char.IsLetterOrDigit(ch))
.ToArray());
Or
String result = String.Concat(source
.Where(ch => Char.IsLetterOrDigit(ch)));
And so you have no need in regular expressions.
Or you can do this too:
public static string RemoveNonAlphanumeric(string text)
{
StringBuilder sb = new StringBuilder(text.Length);
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
sb.Append(text[i]);
}
return sb.ToString();
}
Usage:
string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ñ $ 123 ٠١٢٣٤");
//text: textLaLalol123
The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).
The following code should do what was specified:
Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");
This gives:
regexed = "Hellotherehello"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With