Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I strip non-alphanumeric characters (including spaces) from a string?

How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?

I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).

"Hello there(hello#)".Replace(regex-i-want, "");

should give

"Hellotherehello"

I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", ""); but the spaces remain.

like image 468
James Avatar asked Jan 08 '12 16:01

James


People also ask

How do I remove all non-alphanumeric characters from a string?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do you remove non-alphanumeric characters?

To remove all non-alphanumeric characters from a string, call the replace() method, passing it a regular expression that matches all non-alphanumeric characters as the first parameter and an empty string as the second. The replace method returns a new string with all matches replaced.

How do you replace non-alphanumeric characters with an empty string?

The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.


4 Answers

In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace() which I had overlooked completely...):

result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");

should work. The + makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.

If you want to keep non-ASCII letters/digits, too, use the following regex:

@"[^\p{L}\p{N}]+"

which leaves

BonjourmesélèvesGutenMorgenliebeSchüler

instead of

BonjourmeslvesGutenMorgenliebeSchler
like image 60
Tim Pietzcker Avatar answered Oct 07 '22 07:10

Tim Pietzcker


You can use Linq to filter out required characters:

  String source = "Hello there(hello#)";

  // "Hellotherehello"
  String result = new String(source
    .Where(ch => Char.IsLetterOrDigit(ch))
    .ToArray());

Or

  String result = String.Concat(source
    .Where(ch => Char.IsLetterOrDigit(ch)));  

And so you have no need in regular expressions.

like image 44
Dmitry Bychenko Avatar answered Oct 07 '22 06:10

Dmitry Bychenko


Or you can do this too:

    public static string RemoveNonAlphanumeric(string text)
    {
        StringBuilder sb = new StringBuilder(text.Length);

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                sb.Append(text[i]);
        }

        return sb.ToString();
    }

Usage:

string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ñ $ 123 ٠١٢٣٤");

//text: textLaLalol123
like image 43
Adrianne Avatar answered Oct 07 '22 08:10

Adrianne


The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).

The following code should do what was specified:

Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");

This gives:

regexed = "Hellotherehello"
like image 39
James Avatar answered Oct 07 '22 07:10

James