I have a List of strings:
List<string> _words = ExtractWords(strippedHtml);
_words
contains 1799 indexes; in each index there is a string.
Some of the strings contain only numbers, for example:
" 2"
or "2013"
I want to remove these strings and so in the end the List will contain only strings with letters and not digits.
A string like "001hello"
is OK but "001"
is not OK and should be removed.
You can use LINQ for that:
_words = _words.Where(w => w.Any(c => !Char.IsDigit(c))).ToList();
This would filter out strings that consist entirely of digits, along with empty strings.
_words = _words.Where(w => !w.All(char.IsDigit))
.ToList();
For removing words that are only made of digits and whitespace:
var good = new List<string>();
var _regex = new Regex(@"^[\d\s]*$");
foreach (var s in _words) {
if (!_regex.Match(s).Success)
good.Add(s);
}
If you want to use LINQ something like this should do:
_words = _words.Where(w => w.Any(c => !char.IsDigit(c) && !char.IsWhiteSpace(c)))
.ToList();
You can use a traditional foreach
and Integer.TryParse
to detect numbers.
This will be faster than Regex or LINQ.
var stringsWithoutNumbers = new List<string>();
foreach (var str in _words)
{
int n;
bool isNumeric = int.TryParse(str, out n);
if (!isNumeric)
{
stringsWithoutNumbers.Add(str);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With