Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to cut specified words from string

Tags:

c#

algorithm

There is a list of banned words ( or strings to be more general) and another list with let's say users mails. I would like to excise all banned words from all mails.

Trivial example:

foreach(string word in wordsList)
{
   foreach(string mail in mailList)
   {
      mail.Replace(word,String.Empty);
   }
}

How I can improve this algorithm?


Thanks for advices. I voted few answers up but I didn't mark any as answer since it was more like discussion than solution. Some people missed banned words with bad words. In my case I don't have to bother about recognize 'sh1t' or something like that.

like image 622
zgorawski Avatar asked Oct 05 '10 14:10

zgorawski


2 Answers

Simple approaches to profanity filtering won't work - complex approaches don't work, for the most part, either.

What happens when you get a work like 'password' and you want to filter out 'ass'? What happens when some clever person writes 'a$$' instead - the intent is still clear, right?

See How do you implement a good profanity filter? for extensive discussion.

like image 64
Steve Townsend Avatar answered Nov 03 '22 02:11

Steve Townsend


You could use RegEx to make things a little cleaner:

var bannedWords = @"\b(this|is|the|list|of|banned|words)\b";

foreach(mail in mailList)
    var clean = Regex.Replace(mail, bannedWords, "", RegexOptions.IgnoreCase);

Even that, though, is far from perfect since people will always figure out a way around any type of filter.

like image 36
Justin Niessner Avatar answered Nov 03 '22 02:11

Justin Niessner