I need to remove characters from a string that aren't in the Ascii range from 32 to 175, anything else have to be removed.
I doesn't known well if RegExp can be the best solution instead of using something like .replace() or .remove() pasing each invalid character or something else.
Any help will be appreciated.
You can use
Regex.Replace(myString, @"[^\x20-\xaf]+", "");
The regex here consists of a character class ([...]
) consisting of all characters not (^
at the start of the class) in the range of U+0020 to U+00AF (32–175, expressed in hexadecimal notation). As far as regular expressions go this one is fairly basic, but may puzzle someone not very familiar with it.
But you can go another route as well:
new string(myString.Where(c => (c >= 32) && (c <= 175)).ToArray());
This probably depends mostly on what you're more comfortable with reading. Without much regex experience I'd say the second one would be clearer.
A few performance measurements, 10000 rounds each, in seconds:
2000 characters, the first 143 of which are between 32 and 175
Regex without + 4.1171
Regex with + 0.4091
LINQ, where, new string 0.2176
LINQ, where, string.Join 0.2448
StringBuilder (xanatos) 0.0355
LINQ, horrible (HatSoft) 0.4917
2000 characters, all of which are between 32 and 175
Regex without + 0.4076
Regex with + 0.4099
LINQ, where, new string 0.3419
LINQ, where, string.Join 0.7412
StringBuilder (xanatos) 0.0740
LINQ, horrible (HatSoft) 0.4801
So yes, my approaches are the slowest :-). You should probably go with xanatos' answer and wrap that in a method with a nice, clear name. For inline usage or quick-and-dirty things or where performance does not matter, I'd probably use the regex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With