Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace multiple words in a string from a list of words

Tags:

string

c#

replace

i have a list of words:

string[] BAD_WORDS = { "xxx", "o2o" } // My list is actually a lot bigger about 100 words

and i have some text (usually short , max 250 words), which i need to REMOVE all the BAD_WORDS in it.

i have tried this:

    foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

but, if the text starts or ends with a bad word, it will not be removed. i did it with the spaces, so it will not match partial words for example "oxxx" should not be removed, since it is not an exact match to the BAD WORDS.

anyone can give me advise on this?

like image 213
Rafael Herscovici Avatar asked Sep 01 '12 09:09

Rafael Herscovici


3 Answers

string cleaned = Regex.Replace(input, "\\b" + string.Join("\\b|\\b",BAD_WORDS) + "\\b", "")
like image 74
shannon Avatar answered Oct 01 '22 22:10

shannon


This is a great task for Linq, and also the Split method. Try this:

return string.Join(" ", input.Split(' ').Where(w => !BAD_WORDS.Contains(w)));
like image 24
James Ellis-Jones Avatar answered Oct 02 '22 00:10

James Ellis-Jones


You could use StartWith and EndsWith methods like:

while (input.Contains(w) || input.StartsWith(w) || input.EndsWith(w) || input.IndexOf(w) > 0)
{
   input = input.Replace(w, " ");
}

Hope this will fix your problem.

like image 41
Kundan Singh Chouhan Avatar answered Oct 01 '22 22:10

Kundan Singh Chouhan