I have this function to extract all words from text
public static string[] GetSearchWords(string text)
{
string pattern = @"\S+";
Regex re = new Regex(pattern);
MatchCollection matches = re.Matches(text);
string[] words = new string[matches.Count];
for (int i=0; i<matches.Count; i++)
{
words[i] = matches[i].Value;
}
return words;
}
and I want to exclude a list of words from the return array, the words list looks like this
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
How can I modify the above function to avoid returning words which are in my list.
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();
I think Except
method fits your needs
If you aren't forced to use Regex, you can use a little LINQ:
void Main()
{
var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');
string str = "if you read about cooking you can cook";
var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}
string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
var words = text.Split();
return words.Where(word => !toExclude.Contains(word)).ToArray();
}
I'm assuming a word is a series of non-whitespace characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With