Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing non alpha characters

Tags:

c#

What is the best way in order to remove all non-alpha characters in C#? I have looked up Regex but it doesn't seem to recognise Regex when I do:

string cleanString = "";
    string dirtyString = "I don't_8 really know what ! 6 non alpha- is?";
        cleanString = Regex.Replace(dirtyString, "[^A-Za-z0-9]", "");

Regex comes with a red wiggly line underneath. Is there a way I can remove simply non alpha letters and if so can some provide me with a sample? I'm not sure if loops and arrays are the way to go and also how can I get all non alpha characters? I'm assuming I have to do something like if doesn't equal A-Z or 0-9, then remove with ""?

like image 988
Coder Avatar asked Jun 11 '15 22:06

Coder


People also ask

How do you remove a non alpha character?

Non-alphanumeric characters can be remove by using preg_replace() function. This function perform regular expression search and replace. The function preg_replace() searches for string specified by pattern and replaces pattern with replacement if found.

How do you remove non alpha characters in Excel?

Select the range that you need to remove non-alphanumeric characters from, and click Kutools > Text > Remove Characters. 2. Then a Delete Characters dialog box will appear, only check Non-alphanumeric option, and click the Ok button. Now all of the non-alphanumeric characters have been deleted from the text strings.

How do you remove a non alpha character in Python?

Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String. We can use the isalnum() method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function.

What are non alpha characters?

Non-Alphanumeric characters are the other characters on your keyboard that aren't letters or numbers, e.g. commas, brackets, space, asterisk and so on. Any character that is not a number or letter (in upper or lower case) is non-alphanumeric.


Video Answer


2 Answers

You can do it using LINQ like so:

var cleanString = new string(dirtyString.Where(Char.IsLetter).ToArray());

You can check other Char checks on MSDN.

like image 163
cubski Avatar answered Sep 24 '22 09:09

cubski


Regex comes with a red wiggly line underneath.

Then either:

  1. The compilation prediction isn't working correctly (it does sometimes get things wrong).
  2. You don't have a using System.Text.RegularExpressions in the code, so it can't work out you mean System.Text.RegularExpressions.Regex when you say Regex.

To return to your original question:

What is the best way in order to remove all non-alpha characters in C#?

The approach you take is good for small strings, though [^A-Za-z0-9] will remove non-alphanumerics and [^A-Za-z] non-alphabetical characters. This is assuming you are already restricted to (or want to add a restriction to) US-ASCII characters. To include letters like á, œ, ß or δ because you're dealing with real words rather than computer-code I'd use @"\P{L}" or @"[^\p{L}\p{N}]" to allow all letters and numbers.

If you are dealing with very large piece of text (many kilobytes) then you are better off reading it through a filtering stream that strips the characters you don't want as you go.

like image 27
Jon Hanna Avatar answered Sep 24 '22 09:09

Jon Hanna