What is the best way in order to remove all non-alpha characters in C#? I have looked up Regex but it doesn't seem to recognise Regex when I do:
string cleanString = "";
string dirtyString = "I don't_8 really know what ! 6 non alpha- is?";
cleanString = Regex.Replace(dirtyString, "[^A-Za-z0-9]", "");
Regex comes with a red wiggly line underneath. Is there a way I can remove simply non alpha letters and if so can some provide me with a sample? I'm not sure if loops and arrays are the way to go and also how can I get all non alpha characters? I'm assuming I have to do something like if doesn't equal A-Z or 0-9, then remove with ""?
Non-alphanumeric characters can be remove by using preg_replace() function. This function perform regular expression search and replace. The function preg_replace() searches for string specified by pattern and replaces pattern with replacement if found.
Select the range that you need to remove non-alphanumeric characters from, and click Kutools > Text > Remove Characters. 2. Then a Delete Characters dialog box will appear, only check Non-alphanumeric option, and click the Ok button. Now all of the non-alphanumeric characters have been deleted from the text strings.
Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String. We can use the isalnum() method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function.
Non-Alphanumeric characters are the other characters on your keyboard that aren't letters or numbers, e.g. commas, brackets, space, asterisk and so on. Any character that is not a number or letter (in upper or lower case) is non-alphanumeric.
You can do it using LINQ like so:
var cleanString = new string(dirtyString.Where(Char.IsLetter).ToArray());
You can check other Char checks on MSDN.
Regex comes with a red wiggly line underneath.
Then either:
using System.Text.RegularExpressions
in the code, so it can't work out you mean System.Text.RegularExpressions.Regex
when you say Regex
.To return to your original question:
What is the best way in order to remove all non-alpha characters in C#?
The approach you take is good for small strings, though [^A-Za-z0-9]
will remove non-alphanumerics and [^A-Za-z]
non-alphabetical characters. This is assuming you are already restricted to (or want to add a restriction to) US-ASCII characters. To include letters like á
, œ
, ß
or δ
because you're dealing with real words rather than computer-code I'd use @"\P{L}"
or @"[^\p{L}\p{N}]"
to allow all letters and numbers.
If you are dealing with very large piece of text (many kilobytes) then you are better off reading it through a filtering stream that strips the characters you don't want as you go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With