Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract keywords from text and exclude words

I have this function to extract all words from text

public static string[] GetSearchWords(string text)
{

    string pattern = @"\S+";
    Regex re = new Regex(pattern);

    MatchCollection matches = re.Matches(text);
    string[] words = new string[matches.Count];
    for (int i=0; i<matches.Count; i++)
    {
        words[i] = matches[i].Value;
    }
    return words;
}

and I want to exclude a list of words from the return array, the words list looks like this

string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";

How can I modify the above function to avoid returning words which are in my list.

like image 544
Mario Avatar asked Mar 21 '23 15:03

Mario


2 Answers

string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();

I think Except method fits your needs

like image 79
Selman Genç Avatar answered Apr 01 '23 18:04

Selman Genç


If you aren't forced to use Regex, you can use a little LINQ:

void Main()
{
    var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');

    string str = "if you read about cooking you can cook";

    var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}



string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
    var words = text.Split();

    return words.Where(word => !toExclude.Contains(word)).ToArray();
}

I'm assuming a word is a series of non-whitespace characters.

like image 32
pcnThird Avatar answered Apr 01 '23 17:04

pcnThird